0% found this document useful (0 votes)
6 views

Straightforward Inference of Ancestry and Admixture

This study presents a set of 46 ancestry-informative insertion deletion polymorphisms (AIM-INDELs) designed to efficiently measure population admixture proportions from African, European, East Asian, and Native American origins. The developed multiplexed genotyping method simplifies the process compared to traditional AIM-SNP typing, allowing for straightforward analysis and improved accuracy in inferring ancestry and estimating proportions. The results demonstrate the effectiveness of the AIM-INDEL set in both individual ancestry inference and population-level assessments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Straightforward Inference of Ancestry and Admixture

This study presents a set of 46 ancestry-informative insertion deletion polymorphisms (AIM-INDELs) designed to efficiently measure population admixture proportions from African, European, East Asian, and Native American origins. The developed multiplexed genotyping method simplifies the process compared to traditional AIM-SNP typing, allowing for straightforward analysis and improved accuracy in inferring ancestry and estimating proportions. The results demonstrate the effectiveness of the AIM-INDEL set in both individual ancestry inference and population-level assessments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Straightforward Inference of Ancestry and Admixture

Proportions through Ancestry-Informative Insertion


Deletion Multiplexing
Rui Pereira1,2, Christopher Phillips2, Nádia Pinto1,3,4, Carla Santos2, Sidney Emanuel Batista dos Santos5,
António Amorim1,3, Ángel Carracedo2,6, Leonor Gusmão1*
1 IPATIMUP – Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal, 2 Institute of Forensic Sciences Luis Concheiro, University of
Santiago de Compostela, Santiago de Compostela, Spain, 3 Faculty of Sciences, University of Porto, Porto, Portugal, 4 Mathematics Research Centre, University of Porto,
Porto, Portugal, 5 Laboratório de Genética Humana e Médica, Universidade Federal do Pará, Belém, Brazil, 6 Genomics Medicine Group, CIBERER, University of Santiago de
Compostela, Santiago de Compostela, Spain

Abstract
Ancestry-informative markers (AIMs) show high allele frequency divergence between different ancestral or geographically
distant populations. These genetic markers are especially useful in inferring the likely ancestral origin of an individual or
estimating the apportionment of ancestry components in admixed individuals or populations. The study of AIMs is of great
interest in clinical genetics research, particularly to detect and correct for population substructure effects in case-control
association studies, but also in population and forensic genetics studies. This work presents a set of 46 ancestry-informative
insertion deletion polymorphisms selected to efficiently measure population admixture proportions of four different origins
(African, European, East Asian and Native American). All markers are analyzed in short fragments (under 230 basepairs)
through a single PCR followed by capillary electrophoresis (CE) allowing a very simple one tube PCR-to-CE approach. HGDP-
CEPH diversity panel samples from the four groups, together with Oceanians, were genotyped to evaluate the efficiency of
the assay in clustering populations from different continental origins and to establish reference databases. In addition, other
populations from diverse geographic origins were tested using the HGDP-CEPH samples as reference data. The results
revealed that the AIM-INDEL set developed is highly efficient at inferring the ancestry of individuals and provides good
estimates of ancestry proportions at the population level. In conclusion, we have optimized the multiplexed genotyping of
46 AIM-INDELs in a simple and informative assay, enabling a more straightforward alternative to the commonly available
AIM-SNP typing methods dependent on complex, multi-step protocols or implementation of large-scale genotyping
technologies.

Citation: Pereira R, Phillips C, Pinto N, Santos C, Santos SEBd, et al. (2012) Straightforward Inference of Ancestry and Admixture Proportions through Ancestry-
Informative Insertion Deletion Multiplexing. PLoS ONE 7(1): e29684. doi:10.1371/journal.pone.0029684
Editor: Manfred Kayser, Erasmus University Medical Center, The Netherlands
Received August 22, 2011; Accepted December 2, 2011; Published January 17, 2012
Copyright: ß 2012 Pereira et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was partially supported through PhD grants to RP (SFRH/BD/30039/2006), NP (SFRH/BD/37261/2007), and CS (SFRH/BD/75627/2010)
awarded by the Portuguese Foundation for Science and Technology (FCT) and co-financed by the European Social Fund (Human Potential Thematic Operational
Programme). This work was also partially supported by the project Xunta de Galicia INCITER09 208 163 PR (PI: Victoria Lareu). IPATIMUP is an Associate Laboratory
of the Portuguese Ministry of Education and Science and is partially funded by FCT. The funders had no role in study design, data collection and analysis, decision
to publish, or preparation of the manuscript. No additional external funding received for this study.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: [email protected]

Introduction These investigations have identified a large number of candidate


gene variants showing strong association with specific conditions
Initial studies of human genetic variation focused on Short or phenotypes and subsequent replication studies and meta-
Tandem Repeats (STRs) and Single Nucleotide Polymorphisms analysis have strengthened or weakened these initial findings. One
(SNPs) [1,2], and only later explored Copy Number Variants of the major problems in case-control association studies is the
(CNVs) [3–6] and Insertion Deletion Polymorphisms (INDELs) presence of undetected population structure that can lead to
[7–9] unveiling previously unknown sources of genetic diversity finding false positive associations when an excess of ancestry
that are likely to be important factors underlying inherited traits differentiated markers stratifies the case and the control groups.
and diseases in humans. Moreover, advances in genotyping Alternatively false negative results may occur if real associations
technologies have allowed progressively higher genome coverage are missed if weak while greater allele frequency differentiation
using resources within the normal scope of most genetics exists between study and control groups due to differences in
laboratories. These developments have led to an increase in ancestry [13,14]. Therefore, association studies must be accom-
Genome Wide Association Studies (GWAS) in the search for panied by an evaluation and correction of the possible effects of
genetic variants associated with a wide range of complex diseases population structure between both sample groups. In recent years
and phenotypic traits including, for example, obesity, schizophre- the prevailing strategies to overcome the dangers of population
nia, autism, diabetes, height, eye and skin color [10–12]. stratification use genomic control to measure the possible effects of

PLoS ONE | www.plosone.org 1 January 2012 | Volume 7 | Issue 1 | e29684


Ancestry-Informative INDELs

stratification and correct for such effects using methods that infer marker set in comparison with AIM-SNPs. The direct workflow
genetic ancestry, each with particular pros and cons [14–16]. minimizes manipulation, risks of contamination or sample mix-
Structured association approaches involve inferring genetic ups, and reduces to a minimum the number of variables affecting
ancestry of individuals in subpopulation clusters using programs the end result. Furthermore, the direct fluorescence signals of
like STRUCTURE [17] and association tests are then assessed INDEL alleles allow for mixture detection, providing a consider-
correcting for individual admixture [18]. Principal component able additional benefit over AIM-SNPs assayed by SNaPshot.
analysis (PCA) can also be applied in genetic data to infer In this study a set of 46 AIM-INDELs was selected to efficiently
population structure using the top components as covariates to measure population admixture proportions of four different origins
correct for stratification in GWAS [19]. Another strategy that has (African, European, East Asian and Native American). We have
been considered is genetic matching, in which cases and controls optimized the multiplexed genotyping of the 46 AIMs in a simple
are matched for genetic ancestry, as assessed by one of the and informative assay, enabling a more straightforward alternative
strategies described above [14,20]. In GWAS, using data from a to AIM-SNP typing methods dependent on multi-step protocols or
large number of random genetic markers is by itself sufficient and implementation of genotyping technologies that are expensive,
preferable to achieve good ancestry estimates to use in subsequent complex and platform-dependent. In addition, we established
correction. Nevertheless, when genome wide data are not available reference databases using the HGDP-CEPH diversity panel
and only few loci are studied, such as broad-scale follow-up studies samples [30] from the above four population groups and assessed
focused on regions showing associations (Phase II), a proper the efficiency of the assay in inferring the ancestry of individuals
correction for stratification can be achieved using compact panels from different test populations and estimating ancestry proportions
of ancestry-sensitive or ancestry-informative markers (AIMs) at the individual and population level in an example admixed
[14,21]. population.
AIMs show high allele frequency divergence between different
ancestral or geographically distant populations and are especially Materials and Methods
useful in inferring the likely ancestral origin of an individual or
estimating the apportionment of ancestry components in admixed Ethics Statement
individuals or populations. Ancestry information can then be used The current study was approved by the Institute of Molecular
to perform genetic matching or correct substructure effects in case- Pathology and Immunology of the University of Porto institutional
control association studies. In the population genetics field AIMs review board. Besides the HGDP-CEPH diversity panel human
are used mainly to estimate ancestry proportions in admixed cell line samples, all other samples involved in the study are long-
populations and assess the structure of those populations. lasting anonymized DNA extracts previously obtained with
Furthermore, AIMs are of great interest in forensic genetics, with informed written consent from healthy individuals for research
the potential to provide an intelligence tool in criminal purposes.
investigations. In the absence of any other investigative leads,
AIM genotypes obtained from evidential material could indicate Population samples
the likely ancestry of the donor, and therefore help direct the A total of 1002 DNA samples were used in this study
course of investigations [22–25]. comprising: i) reference samples from the HGDP-CEPH diversity
In recent years several studies have been published reporting panel standardized subset H952 [30,31] with origin in Africa
AIM sets varying greatly in the type of polymorphism, the number (AFR), Europe (EUR), East Asia (EAS), America (NAM) and also
of loci involved and the genotyping strategies, ranging from simple Oceania (OCE), representing a total 584 individuals from 40
PCR followed by capillary electrophoresis (e.g. INDEL sets) to populations. Individuals 1219, 1339, 1344 and 1041 were not
more laborious and resource-intensive technologies (e.g. SNP included in the study since no DNA was available for analysis; in
typing by SNaPshot and TaqMan assays). The reported AIM sets substitution of 1041 we used 1042 who had been excluded from
have also focused attention on different population group subset H952 due to a parent/offspring relationship with 1041 [31];
comparisons, depending on the ancestral contributors to the ii) samples from Angola (48), Portugal (48), Taiwan (48) and
admixed populations under study, or otherwise comprise more Brazilian Amazonas tribes (48) used in a preliminary evaluation of
generic panels aimed at efficient population differentiation at the the AIM-INDEL assay and as example testing samples; iii) samples
continental level. The great majority of AIM panels described to from the city of Belém (226), an admixed population in
date use SNPs and only a minority apply STRs [26] or INDELs northeastern Amazonas, Brazil.
[27].
In this study we followed an approach that brings together AIM-INDEL selection and development of the multiplex
highly informative short binary INDELs that combine the reaction
desirable characteristics of the other genetic markers most An initial pool of candidate INDELs was assembled by
commonly used [7–9,27–29]. INDELs are length polymorphisms collecting previously available population data on this type of
easily genotyped by fragment size differentiation (in similar fashion polymorphism included in the Marshfield Diallelic Insertion/
to widely established STR typing), whereas SNPs require Deletion Polymorphisms database website (http://www.marsh-
determination of the polymorphic base through more complex fieldclinic.org/mgs/; [7]) and from later studies that also
direct or indirect sequencing methods. In brief, AIM-INDELs can characterized some candidate INDELs in different population
offer the same potential as AIM-SNP assays for ancestry detection, groups [27,28,32,33]. Considering the allele frequency data
but have the advantage of being very simply genotyped through a compiled from the diverse sources, all markers were sorted
PCR followed by direct capillary electrophoresis of the amplified according to frequency differentials (d) [34] comparing four
products - a system easily implemented by any laboratory with human population groups of Africans, Europeans, East Asians and
capillary analyzers. The simplicity of the INDEL approach Native Americans. For this study we selected a set of 46 markers
delivers ease-of-use, time and cost effectiveness, and most (Table 1) among the most informative INDELs for each
important in forensic analysis, considerably reduces the steps population group (all with d$0.40 between at least two groups)
involved in the genotyping of an ancestry-informative biallelic and optimized a unique multiplex reaction allowing the simulta-

PLoS ONE | www.plosone.org 2 January 2012 | Volume 7 | Issue 1 | e29684


Ancestry-Informative INDELs

Table 1. AIM-INDELs used in the multiplex.

MID* rs number Chromosome Position (bp)** Alleles described in dbSNP References

MID-1470 rs2307666 11 64729920 -/GTTAC [7,27,33]


MID-777 rs1610863 16 6551830 -/GAA [7]
MID-196 rs16635 6 99789775 -/CAT [7,27,32]
MID-881 rs1610965 5 79746093 -/ACTT [7,32]
MID-3122 rs35451359 18 45110983 -/ATCT [7]
MID-548 rs140837 6 3708909 -/CT [7]
MID-659 rs1160893 2 224794577 -/CT [7]
MID-2011 rs2308203 2 109401291 -/CTAGA [7,27]
MID-2929 rs33974167 8 87813725 -/TA [7]
MID-593 rs1160852 6 137345857 -/TT [7]
MID-798 rs1610884 5 56122323 -/GGGAAA [7,32]
MID-1193 rs2067280 5 89818959 -/AT [7]
MID-1871 rs2308067 7 127291541 -/TT [7]
MID-17 rs4183 3 3192524 -/TAAC [7,32]
MID-2538 rs3054057 15 86010538 -/AACA [7]
MID-1644 rs2307840 1 36099090 -/GT [7,32]
MID-3854 rs60612424 6 84017514 -/TCTA [7]
MID-2275 rs3033053 14 42554496 -/TCAGCAG [7]
MID-94 rs16384 22 42045009 -/AAC [7,33]
MID-3072 rs34611875 18 67623917 -/GCCCCCA [7]
MID-772 rs1610859 5 128317275 -/TAG [7]
MID-2313 rs3045215 1 234740917 -/ATTATAACT [7,32]
MID-397 rs25621 6 139858158 -/TTCT [7]
MID-1636 rs2307832 1 55590789 -/AA [7,32]
MID-51 rs16343 4 17635560 -/TTTAT [7,32]
MID-2431 rs3031979 8 73501951 -/ATTG [7]
MID-2264 rs34122827 13 63778778 -/AAGT [7]
MID-2256 rs133052 22 41042364 -/CAT [7,32]
MID-128 rs6490 12 108127168 -/ATT [7]
MID-15 rs4181 2 42577803 -/AAATACACAC [7,32]
MID-2241 rs3030826 6 67176774 -/GTCCAATA [7,32]
MID-419 rs140708 6 170720016 -/AATGGCA [7,32]
MID-943 rs1611026 5 82545545 -/TGAT [7]
MID-159 rs16438 20 25278470 -/CCCCA [7]
MID-2005 rs2308161 10 69800909 -/AACAAT [7,33]
MID-250 rs16687 7 83887882 -/CA [7,32]
MID-1802 rs2307998 5 7814345 -/GGA [7]
MID-1607 rs2307803 3 108981031 -/TG [7]
MID-1734 rs2307930 6 84476378 -/CCAT [7]
MID-406 rs25630 6 14734341 -/AG [7]
MID-1386 rs2307582 1 247768775 -/AAACTATTCATTTTTCACCCT [7,27]
MID-1726 rs2307922 1 39896964 -/CAAGAACTATAAT/CACTATCTATTAT [7,27,32]
MID-3626 rs11267926 15 45526069 -/AATATAATTTCTCCA [7]
MID-360 rs25584 12 112145217 -/AA [7]
MID-1603 rs2307799 5 70828427 -/TTGT [7,27,33]
MID-2719 rs34541393 20 30701405 -/AACT [7,28]

*Nomenclature according to [7] and Marshfield Diallelic Insertion/Deletion Polymorphisms database;


**Mapping data according to dbSNP (build 132).
doi:10.1371/journal.pone.0029684.t001

PLoS ONE | www.plosone.org 3 January 2012 | Volume 7 | Issue 1 | e29684


Ancestry-Informative INDELs

neous genotyping of all AIMs in a single PCR and electrophoretic The efficiency of the 46 AIM-INDEL set for assigning
run. The multiplex development followed a similar workflow as in individuals to population groups was further evaluated by one-
Pereira et al. [28,35] except for the accommodation of certain out cross-validation based on a flexible single profile analysis
longer amplicons into a broadened size window up to 230 bp in system very similar to STRUCTURE, calculating likelihood ratio
order to type an extended number of INDELs in a single reaction. values obtained with a Bayesian classification algorithm imple-
mented in the ‘‘Snipper app suite’’ website (http://mathgene.usc.
Amplification and genotyping es/snipper/; [23]).
PCR amplification of the 46 AIM-INDELs used the QIAGEN
Multiplex PCR kit (Qiagen) at 16 Qiagen multiplex PCR master Results
mix, 0.1 mM of all primers (sequence details in Table S1) and 0.3–
A simple and informative multiplex was developed for the
5 ng of genomic DNA in a 10 mL final reaction volume.
simultaneous analysis of 46 AIM-INDELs reported to have high d
Thermocycling conditions were: initial step at 95uC for 15 min;
values between the AFR, EUR, EAS or NAM population groups.
30 cycles at 94uC for 30 sec, 60uC for 90 sec, and 72uC for 45 sec;
All markers were analyzed in short fragments (,230 bp) through a
and a final extension at 72uC for 60 min. The PCR products were single PCR followed by capillary electrophoresis (Figure 1). The
then prepared for capillary electrophoresis (CE) by adding 1 mL of workflow of the INDEL assay is straightforward, reducing
amplified product to 10 mL Hi-DiTM Formamide (Applied considerably the steps and resources needed to genotype a large
Biosystems) and 0.3 mL of GeneScanTM 500 LIZH size standard set of biallelic AIMs.
(Applied Biosystems). CE was performed using a 3130 Genetic After optimization of the method we created a database
Analyzer prepared with DS-33 matrix standard, POP-7TM including HGDP-CEPH diversity panel genetic data, commonly
polymer and applying virtual filter G5 (Applied Biosystems). The used by the research community as reference populations for the
electropherograms were analyzed and genotypes were automati- four groups AFR, EUR, EAS, NAM and also from Oceania
cally assigned with GeneMapper v4.0 (Applied Biosystems). For (complete database included in File S1).
practical reasons INDEL short alleles were coded as 1 and long
alleles as 2.
Genetic characterization of reference populations
Patterns of INDEL variability observed in the HGDP-CEPH
Statistical analysis samples from the population groups AFR, EUR, EAS and NAM
Estimation of allele frequencies, exact tests of Hardy-Weinberg are detailed in Table S2 as well as d and pairwise FST for each
equilibrium (HWE), FST genetic distances and linkage disequilib- marker. With few exceptions, the vast majority of the INDELs
rium tests were assessed using Arlequin v3.5.1.2 [36]. Ancestry show high allele frequency differentials and genetic distances
inferences were performed using STRUCTURE v2.3.3 [17,37] between at least two groups (39 with d$0.4 and 44 with d$0.3).
with a burnin length of 100,000 followed by 100,000 MCMC No significant departures from Hardy-Weinberg equilibrium were
repetitions and a variety of parameter sets were tested depending found in the studied populations and pairwise linkage disequilib-
on the objective of the analysis. Initial runs were made without any rium exact tests did not detect significant associations within the
prior information on the origin of samples, using the ‘‘Admixture marker set.
Model’’ and considering either correlated or independent ‘‘Allele One interesting finding was the occurrence of an unexpected
Frequency Models’’; a minimum of 3 independent runs were third allelic state (coded as allele 3) for MID360 and MID2264.
performed for each testing K value, ranging from K = 1 to Sequencing analysis confirmed our observations as a result of
K = number of presumed clusters present in the dataset plus three. additional sequence length variants within the amplicon frag-
The estimated ln probability of data (2lnP(D)) values were plotted ments. For MID360 the third allelic state observed is due to a T
using Structure harvester v0.6.6. (http://taylor0.biology.ucla.edu/ insertion associated with the short allele, 8 bases downstream of
structureHarvester/). In a second phase, when using reference the targeted polymorphism (allele 1D8Tins). Conversely for
samples as training sets to test for ‘‘unknown’’ individuals or MID2264, allele 3 corresponds to a T deletion occurring in the
populations, STRUCTURE analyses were carried out using the long allele background (allele 2D68Tdel). Interestingly, the
same parameters as before or selecting the ‘‘Use Population MID360 variant alleles were only found in AFR samples whereas
Information’’ option. In these cases, allele frequencies were the MID2264 variants seemed specific of EUR, further contrib-
updated using only the reference individuals with POPFLAG = 1 uting to the differentiation of the two groups.
data (option under the Advanced tab). Here, 3 independent runs
were performed only for the appropriate number of clusters, as Inferring genetic ancestry
evaluated by the initial analysis. Unless otherwise indicated, results - The AIM-INDEL panel efficiently distinguishes four
are presented for the default settings considering the ‘‘Admixture major population groups. Before implementing the HGDP-
Model’’ and correlated allele frequencies. CLUMPP v1.1.2 [38] CEPH diversity panel reference database, a preliminary
was used to obtain the average permutated individual and evaluation on the performance of a panel comprising 44
population Q-matrices throughout the three replicates for each INDELs at the time (without MID94 and MID1734) had been
K value. Those matrices were used as input to distruct v1.1 [39] to performed using 48 samples with origin in each of the four groups
obtain bar plots where each individual is represented as a segment under study (detailed results in Figure S1). In brief, analyses with
divided into K colors that represent the estimated membership STRUCTURE, PCA and one-out cross validation clearly
coefficients from each cluster. supported the efficiency of the panel in clustering individuals
Principal component analysis (PCA) was performed as an into four population groups.
additional and independent approach to estimate the number of The results obtained for the complete AIM-INDEL panel with
populations present in the data set. We used R 2.11.1 [40] with HGDP-CEPH AFR, EUR, EAS and NAM populations strongly
SNPassoc package [41] to obtain two and three dimensional corroborate these preliminary findings (Figure 2). STRUCTURE
graphics and the information percentage values associated to each ancestry estimates considering K = 4 still produce an enhancement
principal component. in 2ln P(D) values while a plateau is reached thereafter, which

PLoS ONE | www.plosone.org 4 January 2012 | Volume 7 | Issue 1 | e29684


Ancestry-Informative INDELs

Figure 1. Example of an electropherogram obtained for the HGDP-CEPH 0452 sample with the 46 AIM-INDEL multiplex (markers
are identified by MID number).
doi:10.1371/journal.pone.0029684.g001

points to 4 as the smallest K number capturing the major mainly of European, Native American and African origin
population structure in the data and supports the inference that a (Figure 3; Table 2), resulting in average ancestry proportions of
four group clustering better fits the genetic data (Figure 2A and 53.5% EUR, 22.9% NAM, 14.8% AFR and 8.8% EAS.
2B). Considering the historical formation and peopling of Brazil in
PCA for the same dataset allows an independent non-model which there were three main contributing ancestral populations
based view of the individual clustering. The first three PCs define (NAM, EUR and AFR) we performed a three-group analysis for
approximately half of the variance in the dataset (46.1%) yet allow the particular case of Belém – specifically, excluding EAS and
a clear spatial separation of four different groups (Figure 2C). using only NAM, EUR and AFR ancestral groups with K = 3
Likewise, cross-validation studies (Figure 2D) revealed the INDEL (Figure S2). In particular the Native American proportion
panel to show a high accuracy of population assignment, with a increased (53.7% EUR, 29.5% NAM and 16.8% AFR), having
global classification error of 1.26% (specifically 7 of 556). All AFR, captured most of the previous East Asian component.
EUR and NAM were correctly assigned whereas misclassified - Indications of population differentiations beyond four
individuals were all from the Yakut population in Siberia except groups from inclusion of Oceanians. The AIM-INDEL
for one individual from Oroquen, China. panel was primarily designed as a tool for ascertaining ancestry
- HGDP-CEPH genetic data as reference genotypes to test from four major population groups. Nonetheless, as there is
individuals or populations of unknown origin. Reference general interest in AIM panels able to distinguish populations at
HGDP-CEPH diversity panel genetic data from the four the broader continental level, we extended our study to HGDP-
population groups (AFR, EUR, EAS and NAM) was used to CEPH Oceanian samples and assessed the ability of the panel to
estimate ancestry proportions of individuals/populations from differentiate populations with origin in all five continent regions.
different geographic locations. We tested samples from Angola, Following the same evaluation strategy as before, the assay proved
Portugal, Taiwan and Brazil (Amazonas Amerindian tribes and to consistently recognize a fifth cluster corresponding to Oceanians
Belém, a northeastern Amazonas city). The individual and global and that K = 5 captures most of the structure in the dataset
admixture estimates obtained with genetic data only (no prior (Figure 4; Figure S3 for details). PCA plots (Figure S3C) show most
population information) correspond well with expected patterns, HGDP-CEPH Oceanians form a distinguishable cloud lying
knowing the origin of the subjects (Figure 3; Table 2). In general, between EUR and EAS even though the separation is not perfectly
individuals from the non-admixed populations show high achieved. In a five-group classification, the one-out cross
membership proportion in the same cluster as HGDP-CEPH validation error rate increased slightly to 1.54% (9/584). The
representatives of the same population group. In contrast assignment of Oceanians was accurately made but two EAS (from
individuals from Belém show highly variable admixture patterns Cambodia) were now misclassified as OCE.

PLoS ONE | www.plosone.org 5 January 2012 | Volume 7 | Issue 1 | e29684


Ancestry-Informative INDELs

Figure 2. Analysis of HGDP-CEPH diversity panel samples from four continental origins using a set of 46 AIM-INDELs. A) ancestral
membership proportions (based on STRUCTURE results from 3 independent runs treated in CLUMPP and plotted with distruct; individuals were first
sorted by geographic origin of population, and within those by ascending population code and HGDP individual number); B) estimated ln probability
of the data (2ln P(D) obtained with STRUCTURE and plotted using Structure harvester); C) principal component analysis 3D plots. D) estimation of
population assignment success (results from one-out cross validation studies using the Snipper app suite; see methods for details of the analyses).
AFR: Africa; EUR: Europe; EAS: East Asia; NAM: Native America.
doi:10.1371/journal.pone.0029684.g002

Figure 3. Ancestral membership proportions for testing population samples from different continental origins using the HGDP-
CEPH diversity panel genetic data as training sets. Angola (Africa); Portugal (Europe); Taiwan (East Asia); Brazilian Amazonas tribes (Native
America); Belém is an example of a highly admixed Brazilian city in northeastern Amazonas.
doi:10.1371/journal.pone.0029684.g003

PLoS ONE | www.plosone.org 6 January 2012 | Volume 7 | Issue 1 | e29684


Ancestry-Informative INDELs

Table 2. Ancestral membership proportions for HGDP-CEPH diversity panel samples and testing populations from four continental
origins.

46 AIM-INDELs (this study) 210 INDELs [32] 48 In4 AIM-SNP set [44]

AFR EUR EAS NAM AFR EUR EAS NAM AFR EURA EAS AMI

HGDP-CEPH AFR 0.969 0.011 0.012 0.008 0.977 0.009 0.009 0.005 AFR 0.97 0.02 0.01 0.01
HGDP-CEPH EUR 0.008 0.963 0.014 0.014 0.007 0.967 0.013 0.013 EURA 0.01 0.96 0.02 0.01
HGDP-CEPH EAS 0.006 0.018 0.952 0.024 0.007 0.021 0.955 0.017 EAS 0.01 0.04 0.91 0.03
HGDP-CEPH NAM 0.008 0.041 0.027 0.924 0.011 0.028 0.015 0.946 AMI 0.01 0.03 0.04 0.92
Testing populations: AFR EUR EAS NAM
Angola 0.970 0.011 0.011 0.008
Portugal 0.018 0.966 0.008 0.008
Taiwan 0.004 0.003 0.984 0.009
Br. Amazonas tribes 0.010 0.013 0.032 0.945
Belém (4G-analysis) 0.148 0.535 0.088 0.229
Belém (3G-analysis) 0.168 0.537 - 0.295

(AFR: Africa; EUR: Europe; EAS: East Asia; NAM: Native America).
doi:10.1371/journal.pone.0029684.t002

Discussion major population groups of AFR, EUR, EAS and NAM, similarly
to Halder et al. [43]. Our objective was to balance combining the
The main objective of this study was to provide a simple tool for highest number of AIMs possible into a single reaction with use of
inferring ancestry and estimate admixture proportions from four amplicon lengths suitable for the analysis of low quality DNA. The
different population origins that can be widely applied to genetic limitation of large multiplex reliability restricts the maximum
studies. We describe a new AIM assay comprising 46 INDELs that number of markers in a single reaction. On the other hand, AIMs
are simply analyzed in a multiplex PCR followed by CE detection. have an important application in forensic investigations where the
With this approach we were able to combine the ancestry quantity and quality of the samples are often limiting factors. We
informative power of biallelic markers (exemplified by AIM-SNP were eventually able to multiplex 46 highly informative INDELs,
panels) with the simplified analysis based in fragment size with a scope of markers comparable to other AIM sets reported
separation (as in STR typing). The methodology of the assay is [23,27,46–48]. Kosoy et al. [44] have shown that small AIM sets
straightforward and can be readily and inexpensively implemented can distinguish major population groups and correct for false
in any molecular genetics laboratory. In contrast, the majority of positive results in association studies. Other studies have addressed
AIM sets published in recent years involve more complex ancestry prediction of the HGDP-CEPH samples using large-scale
genotyping protocols or are limited to specific platforms not SNP datasets obtained with high-throughput microarrays, and
available to all laboratories and therefore requiring additional have also evaluated the performance of small subsets of markers
resources [e.g. 23,27,42–49]. Another important aspect is that ascertained following different strategies such as FST, allele
some AIM sets are directed to differentiate specific population differentials ä, informativeness of assignment index In [51] or
groups depending on the main ancestral contributors to the PCA (e.g. [42,48,52]). These studies have shown that inference of
individuals or populations under investigation [e.g. 23,27,50]. We continental ancestry for the HGDP-CEPH panel is quite clear,
aimed to develop a generic panel, designed to target the four and can be performed with a relatively small number of SNPs (10

Figure 4. Ancestral membership proportions for HGDP-CEPH diversity panel samples from five continental origins using a set of 46
AIM-INDELs (based on STRUCTURE results from 3 independent runs treated in CLUMPP and plotted with distruct; individuals were
first sorted by geographic origin of population, and within those by ascending population code and HGDP individual number). AFR:
Africa; EUR: Europe; EAS: East Asia; NAM: Native America; OCE: Oceania.
doi:10.1371/journal.pone.0029684.g004

PLoS ONE | www.plosone.org 7 January 2012 | Volume 7 | Issue 1 | e29684


Ancestry-Informative INDELs

to 50). They also showed that, when using SNPs, it is possible to others is slightly smaller than for the other ancestral groups, and
predict individual ancestry down to the population level, although the fact that the HGDP-CEPH EAS group analyzed is so diverse
such approaches require an increased number of markers ranging (229 individuals from 18 subpopulations) may contribute to this
from several hundred to thousands [48,52]. Due to the reduced differentiation for East Asians. Another important aspect
multiplexing limitation associated with the number of markers is the proximity of the East Asian and Native American gene
that it is possible to analyze in a single PCR, and in the same pools. Considering the history of modern humans these groups
respect for other small-scale AIM sets, the 46 AIM-INDEL assay have diverged over the shortest time, and furthermore, the original
we outline is only going to be particularly useful when broad peopling of Americas from Beringia involved a significant
assignment to continental ancestry is desired, or when estimating bottleneck effect that is still reflected in Native American
admixture proportions in individuals/populations that received variability. Despite this slightly reduced level of differentiation in
ancestral contributions of different continental origins. Assessing the AIM-INDELs selected, STRUCTURE, PCA and cross
within-continent population structure requires much larger arrays validation studies together support the capacity of the panel to
of markers, well beyond the number included in our set and in properly distinguish both groups.
most of the alternative AIM sets, and therefore it will have limited AIM panels are regularly applied in population genetics studies
application for that purpose. to analyze admixed populations by estimating admixture propor-
The AIM-INDEL assay allowed a rapid and cost-effective tions both at the individual and population level. Depending on
genotyping of a large number of samples including HGDP-CEPH the historical context of populations under study, there are
individuals from five continental groups (AFR, EUR, EAS, NAM different principal ancestral contributors to the formation of the
and OCE) and representative testing samples with different origins current ancestry characteristics of the region. For example, Brazil
and admixture levels. From the genetic characterization of the and the majority of south-American countries underwent
reference ancestral samples we observed a high level of admixture between the pre-existent Native Americans, colonizing
differentiation from the chosen INDELS, as expected from the Europeans and later African influences resulting from the slave
selection criteria. Although some markers revealed lower differ- trade to create essentially tri-hybrid populations. In such cases, it is
ences than those expected from previous data, this is possibly due appropriate for genetic studies to perform three-group analyses of
to the samples representing each group and allele frequency ancestry estimates. Our study analyzed ancestry proportions in
estimation strategies being different [e.g. 7]. The pairwise FST Belém. We first considered the possibility of a fourth EAS minor
values obtained with the 46 AIM-INDELs (Table S2) are clearly ancestral contributor in initial analyses and K = 4 resulted in a low
above the usually found at the continental level with random level but detectable fraction of membership of this cluster at 8.8%
markers [45,53] and similar to those obtained with other AIM (Table 2). However, although not statistically significant (exact test
panels for the same population groups [44–46]. of differentiation p value = 0.136), the three-group membership
The results from the HGDP-CEPH diversity panel and other proportion estimates at K = 3 showed a noticeable increase in the
representative populations underlined the capacity of the panel to Native American component to 29.5% (Table 2) which is in very
distinguishing four continental population groups. Furthermore, close agreement with the admixture proportions previously
the ancestry estimates obtained in a four-group analysis are very reported for the same population but using a different set of
similar to those obtained in Kosoy et al. [44] with a 48 In4 AIM- AIM-INDELs (average NAM estimate: 28.4%; [27]). Neverthe-
SNP set for equivalent population groups (Table 2), as well as less, a preliminary four-group analysis has persuasive arguments
using a much larger number of INDELs (210) for the same for considering all four potential contributors to admixture in these
HGDP-CEPH individuals (Table 2; [32]). This concordance in the regions. In particular, some locations in Brazil (e.g. São Paulo,
ancestry estimates highlights the accuracy of the AIM-INDEL Campinas; IBGE – Instituto Brasileiro de Geografia e Estatı́stica, www.
panel in inferring ancestry proportions from African, European, ibge.gov.br) include significant East Asian communities, despite
East Asian and Native American origin. Furthermore, in spite of having joined these populations rather recently. When using
the assay being primarily designed for studies considering only Brazilian samples from such geographic areas, particularly as case
four major population groups, extension to five groups revealed and control samples for association studies, a preliminary four-
the capacity to reliably distinguish Oceanians. group analysis is recommended to detect the presence of East
The population assignment cross validation studies based on Asian ancestry amongst individuals in the study. Otherwise there is
Bayesian likelihood ratios provided additional evidence of the considerable risk that the a priori rejection of this hypothesis based
utility of the assay, particularly for forensic applications where on three-group analyses could lead to an over-estimation of the
single profiles are often analyzed one at a time. Here the error Native American proportion in the global admixture estimates
rates in classifications considering either four or five population (data not shown) due to a strong bias caused by the presence of
groups were low (1.26% and 1.54% respectively). The AIM- East Asian individuals in the population under study. Conversely,
INDEL panel achieves very high accuracy for population when ‘‘forcing’’ a four-group analysis in south-American tri-hybrid
assignment in the five broad continental regions, similar to results populations, it is possible that the fourth East Asian component
observed by Paschou et al. when using subsets of 50 SNPs can produce a spurious fraction of membership arising from the
ascertained by PCA and estimation of In metrics [48]. In our Native American component, due to the close relationship of the
study, the great majority of misclassified individuals were from a East Asian and Native American population groups. In summary,
single population (Yakut of eastern Siberia) localized near the we advocate adopting an approach taking due regard for the
northeastern fringe of the Asian continent. This intermediate particular population under study. Consideration of the known
position between East Asia and the American continent can recent population history and demographics helps make appro-
explain differences in patterns of divergence between individuals priate adjustment for the different principal ancestral contributors.
and their misclassification as American. Likewise, the cross In the special case of south-American populations, we recommend
validation studies with five groups revealed two misclassified a preliminary study taking advantage of the full potential of the
Cambodians as Oceanians. Together, these results suggest a AIM-INDEL assay to identify and possibly exclude East Asian
weaker performance of the panel with differentiation of East study subjects, and subsequently perform a comprehensive three-
Asians. In fact, the accumulated divergence assessed for EAS vs. group analysis. The AIM-INDEL assay can be efficiently used in

PLoS ONE | www.plosone.org 8 January 2012 | Volume 7 | Issue 1 | e29684


Ancestry-Informative INDELs

three-group analyses AFR/EUR/NAM, similarly to [27] and also information; red: Africa; green: Europe; blue: Native American;
AFR/EUR/EAS, as in [23]. Nonetheless, the reliability of the yellow: Belém).
four-way analysis we repeatedly achieve with this multiplex allows (PDF)
a clear distinction of all groups.
Figure S3 Analysis of HGDP-CEPH diversity panel
In conclusion, we have optimized the multiplexed genotyping of
samples from five continental origins using a set of 46
46 AIM-INDELs in a simple and informative assay, enabling a
AIM-INDELs. A) ancestry membership proportions (estimated
more straightforward alternative to the commonly available AIM-
based on STRUCTURE results from 3 independent runs treated
SNP typing methods dependent on multi-step protocols and/or
in CLUMPP and plotted with distruct; individuals were first sorted
implementation of dedicated genotyping technologies. The AIM-
by geographic origin of population. and within those by ascending
INDEL assay produces accurate individual ancestry estimates of
population code and HGDP individual number); B) estimated ln
four different origins, which can be applied to the correction of
probability of the data (2lnP(D) obtained with STRUCTURE
false positive results due to population stratification between case
and plotted using Structure harvester); C) principal component
and control samples in association studies. Most effectively it can
analysis 3D plots. D) estimation on population assignment success
be used as a simple and inexpensive tool for the initial screening of
(results from one-out cross validation studies using the Snipper app
individuals prior to expensive GWA studies or to allow precise
suite; see methods for details on the analyses). AFR: Africa; EUR:
matching of ancestries amongst case and control samples. Finally,
Europe; EAS: East Asia; NAM: Native America; OCE: Oceania.
given the relatively high efficiency in population assignment of
(PDF)
individuals from all five continental origins, the multiplex
represents a tool of considerable potential in forensic applications. Table S1 PCR primer sequences used in the multiplex.
(PDF)
Supporting Information Table S2 Allele frequencies, d and FST values for the 46
Figure S1 Analysis of population samples from four AIM-INDELs in HGDP-CEPH diversity panel population
different continental origins using a preliminary set of samples from Africa (AFR), Europe (EUR), East Asia
44 AIM-INDELs (without MID94 and MID1734). A) (EAS) and Native America (NAM).
ancestral membership proportions (based on STRUCTURE results (PDF)
from 3 independent runs treated in CLUMPP and plotted with File S1 Genotypic data (STRUCTURE format) for the 46
distruct); B) estimated ln probability of the data (2lnP(D) obtained AIM-INDELs in HGDP-CEPH diversity panel population
with STRUCTURE and plotted using Structure harvester); C) samples from Africa, Europe, East Asia and Native
principal component analysis 3D plots; D) estimation on population America.
assignment success (results from one-out cross validation studies (TXT)
using the Snipper app suite; see methods for details on the analyses).
Angola (Africa); Portugal (Europe); Taiwan (East Asia); Brazilian Acknowledgments
Amazonas tribes (Native America).
(PDF) The authors would like to acknowledge Dayse Alencar for the technical
support with the samples from Belém. The assistance of Fundación Botı́n is
Figure S2 Ancestral membership proportions in the also acknowledged with appreciation.
Brazilian city of Belém using HGDP-CEPH diversity
panel genetic data of three main ancestral contributors Author Contributions
as training sets. A) bar plots based on STRUCTURE results
Conceived and designed the experiments: RP AC LG. Performed the
from 3 independent runs treated in CLUMPP and plotted with experiments: RP CS SEBdS. Analyzed the data: RP CP NP CS LG.
distruct (AFR: Africa; EUR: Europe; NAM: Native America); B) Contributed reagents/materials/analysis tools: CP SEBdS AA AC LG.
triangular plots based on STRUCTURE results from the run with Wrote the paper: RP LG. Critically reviewed the manuscript: CP CS NP
highest 2lnP(D) (left: admixture model; right: using population SEBdS AA AC.

References
1. International Human Genome Sequencing Consortium (2001) Initial sequenc- 10. Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common
ing and analysis of the human genome. Nature 409: 860–921. diseases and complex traits. Nat Rev Genet 6: 95–108.
2. International SNP Map Working Group (2001) A map of human genome 11. Donnelly P (2008) Progress and challenges in genome-wide association studies in
sequence variation containing 1.42 million single nucleotide polymorphisms. humans. Nature 456: 728–731.
Nature 409: 928–933. 12. Hirschhorn JN, Gajdos ZK (2011) Genome-wide association studies: results from
3. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, et al. (2006) Global the first few years and potential implications for clinical medicine. Annual
variation in copy number in the human genome. Nature 444: 444–454. Review of Medicine 62: 11–24.
4. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, et al. (2010) Origins and 13. Marchini J, Cardon LR, Phillips MS, Donnelly P (2004) The effects of human population
functional impact of copy number variation in the human genome. Nature 464: structure on large genetic association studies. Nature Genetics 36: 512–517.
704–712. 14. Tian C, Gregersen PK, Seldin MF (2008) Accounting for ancestry: population
5. Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, et al. (2010) substructure and genome-wide association studies. Human Molecular Genetics 17:
Diversity of human copy number variation and multicopy genes. Science 330: R143–150.
641–646. 15. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics
6. Alkan C, Coe BP, Eichler EE (2011) Genome structural variation discovery and 55: 997–1004.
genotyping. Nat Rev Genet 12: 363–376. 16. Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approaches to
7. Weber JL, David D, Heil J, Fan Y, Zhao C, et al. (2002) Human diallelic population stratification in genome-wide association studies. Nat Rev Genet 11:
insertion/deletion polymorphisms. American Journal of Human Genetics 71: 459–463.
854–862. 17. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure
8. Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, et al. (2006) An initial using multilocus genotype data. Genetics 155: 945–959.
map of insertion and deletion (INDEL) variation in the human genome. 18. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P (2000) Association
Genome Research 16: 1182–1190. mapping in structured populations. Am J Hum Genet 67: 170–181.
9. Mills RE, Pittard WS, Mullaney JM, Farooq U, Creasy TH, et al. (2011) Natural 19. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006)
genetic variation caused by small insertions and deletions in the human genome. Principal components analysis corrects for stratification in genome-wide
Genome Research 21: 830–839. association studies. Nature Genetics 38: 904–909.

PLoS ONE | www.plosone.org 9 January 2012 | Volume 7 | Issue 1 | e29684


Ancestry-Informative INDELs

20. Luca D, Ringquist S, Klei L, Lee AB, Gieger C, et al. (2008) On the use of 37. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure
general control samples for genome-wide association studies: genetic matching using multilocus genotype data: linked loci and correlated allele frequencies.
highlights causal variants. Am J Hum Genet 82: 453–463. Genetics 164: 1567–1587.
21. Seldin MF, Price AL (2008) Application of ancestry informative markers to 38. Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster matching and
association studies in European Americans. PLoS Genet 4: e5. permutation program for dealing with label switching and multimodality in
22. Frudakis T, Venkateswarlu K, Thomas MJ, Gaskin Z, Ginjupalli S, et al. (2003) analysis of population structure. Bioinformatics 23: 1801–1806.
A classifier for the SNP-based inference of ancestry. Journal of Forensic Sciences 39. Rosenberg NA (2004) distruct: a program for the graphical display of population
48: 771–782. structure. Molecular Ecology Notes 4: 137–138.
23. Phillips C, Salas A, Sanchez JJ, Fondevila M, Gomez-Tato A, et al. (2007) 40. R Development Core Team (2010) R: A language and environment for
Inferring ancestral origin using a single multiplex assay of ancestry-informative statistical computing. 2.11.1 ed. ViennaAustria: R Foundation for Statistical
marker SNPs. Forensic Science International: Genetics 1: 273–280. Computing.
24. Phillips C, Prieto L, Fondevila M, Salas A, Gomez-Tato A, et al. (2009) Ancestry 41. Gonzalez JR, Armengol L, Sole X, Guino E, Mercader JM, et al. (2007)
analysis in the 11-M Madrid bomb attack investigation. PLoS ONE 4: e6583. SNPassoc: an R package to perform whole genome association studies.
25. Kayser M, de Knijff P (2011) Improving human forensics through advances in Bioinformatics 23: 644–645.
genetics, genomics and molecular biology. Nat Rev Genet 12: 179–192. 42. Lao O, van Duijn K, Kersbergen P, de Knijff P, Kayser M (2006) Proportioning
26. Londin ER, Keller MA, Maista C, Smith G, Mamounas LA, et al. (2010) whole-genome single-nucleotide-polymorphism diversity for the identification of
CoAIMs: a cost-effective panel of ancestry informative markers for determining geographic population structure and genetic ancestry. Am J Hum Genet 78:
continental origins. PLoS One 5: e13443. 680–690.
27. Santos NP, Ribeiro-Rodrigues EM, Ribeiro-dos-Santos AK, Pereira R, 43. Halder I, Shriver M, Thomas M, Fernandez JR, Frudakis T (2008) A panel of
Gusmão L, et al. (2010) Assessing individual interethnic admixture and ancestry informative markers for estimating individual biogeographical ancestry
population substructure using a 48-insertion-deletion (INDEL) ancestry- and admixture from four continents: utility and applications. Human Mutation
informative marker (AIM) panel. Human Mutation 31: 184–190. 29: 648–658.
28. Pereira R, Phillips C, Alves C, Amorim A, Carracedo A, et al. (2009) A new 44. Kosoy R, Nassir R, Tian C, White PA, Butler LM, et al. (2009) Ancestry
multiplex for human identification using insertion/deletion polymorphisms. informative marker sets for determining continental origin and admixture
Electrophoresis 30: 3682–3690. proportions in common populations in America. Human Mutation 30: 69–78.
29. Mullaney JM, Mills RE, Pittard WS, Devine SE (2010) Small insertions and 45. Nassir R, Kosoy R, Tian C, White PA, Butler LM, et al. (2009) An ancestry
deletions (INDELs) in human genomes. Human Molecular Genetics 19: informative marker set for determining continental origin: validation and
R131–136. extension using human genome diversity panels. BMC Genet 10: 39.
30. Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, et al. (2002) A human 46. Kersbergen P, van Duijn K, Kloosterman AD, den Dunnen JT, Kayser M, et al.
genome diversity cell line panel. Science 296: 261–262. (2009) Developing a set of ancestry-sensitive DNA markers reflecting continental
31. Rosenberg NA (2006) Standardized subsets of the HGDP-CEPH Human origins of humans. BMC Genet 10: 69.
Genome Diversity Cell Line Panel, accounting for atypical and duplicated 47. Lao O, Vallone PM, Coble MD, Diegoli TM, van Oven M, et al. (2010)
samples and pairs of close relatives. Annals of Human Genetics 70: 841–847. Evaluating self-declared ancestry of U.S. Americans with autosomal, Y-
32. Rosenberg NA, Mahajan S, Ramachandran S, Zhao C, Pritchard JK, et al. chromosomal and mitochondrial DNA. Human Mutation 31: E1875–1893.
(2005) Clines, clusters, and the effect of study design on the inference of human 48. Paschou P, Lewis J, Javed A, Drineas P (2010) Ancestry informative markers for
population structure. PLoS Genet 1: e70. fine-scale individual assignment to worldwide populations. Journal of Medical
33. Yang N, Li H, Criswell LA, Gregersen PK, Alarcon-Riquelme ME, et al. (2005) Genetics 47: 835–847.
Examination of ancestry and ethnic affiliation using highly informative diallelic 49. Kidd JR, Friedlaender FR, Speed WC, Pakstis AJ, De La Vega FM, et al. (2011)
DNA markers: application to diverse and admixed populations and implications Analyses of a set of 128 ancestry informative single-nucleotide polymorphisms in
for clinical epidemiology and forensic medicine. Human Genetics 118: 382–392. a global set of 119 population samples. Investig Genet 2: 1.
34. Shriver MD, Smith MW, Jin L, Marcini A, Akey JM, et al. (1997) Ethnic- 50. Tandon A, Patterson N, Reich D (2011) Ancestry informative marker panels for
affiliation estimation by use of population-specific DNA markers. Am J Hum African Americans based on subsets of commercially available SNP arrays.
Genet 60: 957–964. Genetic Epidemiology 35: 80–83.
35. Pereira R, Pereira V, Gomes I, Tomas C, Morling N, et al. (2011) A method for 51. Rosenberg NA, Li LM, Ward R, Pritchard JK (2003) Informativeness of genetic
the analysis of 326 chromosome insertion deletion polymorphisms in a single markers for inference of ancestry. Am J Hum Genet 73: 1402–1422.
PCR. International Journal of Legal Medicine, In press;doi:10.1007/s00414- 52. Biswas S, Scheinfeldt LB, Akey JM (2009) Genome-wide insights into the
011-0593-2. patterns and determinants of fine-scale population structure in humans.
36. Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: a new series of programs American Journal of Human Genetics 84: 641–650.
to perform population genetics analyses under Linux and Windows. Molecular 53. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, et al. (2002)
Ecology Resources 10: 564–567. Genetic Structure of Human Populations. Science 298: 2381–2385.

PLoS ONE | www.plosone.org 10 January 2012 | Volume 7 | Issue 1 | e29684

You might also like