Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

American Indian mtdna and Y Chromosome Genetic Data - Book Summary - Indian literature, Summaries of Indian Literature

Over the past two decades physical anthropologists and molecular geneticists have begun to use molecular data in their attempt to answer several, long standing questions, especially those of migrations in human history.

Typology: Summaries

2010/2011

Uploaded on 12/08/2011

aramix
aramix 🇬🇧

4.5

(29)

368 documents

1 / 135

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
American Indian mtDNA and Y Chromosome Genetic Data: A
Comprehensive Report of their Use in Migration and Other
Anthropological Studies
by
Peter N. Jones
July 31, 2004
________________________________________________________________________
A. INTRODUCTION
Over the past two decades physical anthropologists and molecular geneticists have begun
to use molecular data in their attempt to answer several, long standing questions,
especially those of migrations in human history. This new line of research has been
called genetic anthropology, molecular anthropology, and archaeogenetics (see Renfrew
& Boyle, 2000). For the purposes of this report the term molecular anthropology will be
used. Presently most of the new information related to this field has come from DNA
obtained from living populations, though there are a handful of studies using ancient
DNA (aDNA), molecular data recovered from historic bones, teeth, and other resources.
Within this new line of research, many procedures have been used, some more robust
than others, which have allowed researchers to yield conclusions about the relationships
between populations, both past and present. Therefore, researchers have been able to
hypothesize about present relationships between populations and their demographic
histories, which are situated in the past.
This report is a comprehensive analysis of the history, theory, and current state of the
field of molecular anthropology, focusing on the use of mitochondrial DNA (mtDNA)
and Y chromosome genetic material of North American Indians. The report begins with
a methodology section, followed by a brief history of the development of this new line of
research. Following these two sections is one covering the relevant terminology and
conceptions employed within the molecular anthropological field. Penultimately, two
sections discussing the major findings of the field are offered, the first focusing on
mtDNA research, and the second discussing Y chromosome research. Finally the report
ends with a summary and conclusion. An appendix is also included, abstracting every
study that could be located, as well as giving the title of the study, authors, citation,
funding sources for the study, and genetic materials used.
B. METHODS
This study was conducted using the standard methods of a systematic analysis.
Systematic analyses provide a rational synthesis of the research base and offer clear
advantages to decision-makers. They attempt to overcome the deficiencies of narrative
reviews and polemics by applying rigorous standards. Good systematic reviews take
great care to find all the relevant studies (published and unpublished), assess each study
for the quality of its design and execution, and combine the findings from individual
studies in an unbiased manner. In this way they aim to present a balanced and impartial
summary of the existing research evidence.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download American Indian mtdna and Y Chromosome Genetic Data - Book Summary - Indian literature and more Summaries Indian Literature in PDF only on Docsity!

American Indian mtDNA and Y Chromosome Genetic Data: A

Comprehensive Report of their Use in Migration and Other

Anthropological Studies

by Peter N. Jones July 31, 2004

________________________________________________________________________

A. INTRODUCTION

Over the past two decades physical anthropologists and molecular geneticists have begun to use molecular data in their attempt to answer several, long standing questions, especially those of migrations in human history. This new line of research has been called genetic anthropology, molecular anthropology, and archaeogenetics (see Renfrew & Boyle, 2000). For the purposes of this report the term molecular anthropology will be used. Presently most of the new information related to this field has come from DNA obtained from living populations, though there are a handful of studies using ancient DNA (aDNA), molecular data recovered from historic bones, teeth, and other resources. Within this new line of research, many procedures have been used, some more robust than others, which have allowed researchers to yield conclusions about the relationships between populations, both past and present. Therefore, researchers have been able to hypothesize about present relationships between populations and their demographic histories, which are situated in the past.

This report is a comprehensive analysis of the history, theory, and current state of the field of molecular anthropology, focusing on the use of mitochondrial DNA (mtDNA) and Y chromosome genetic material of North American Indians. The report begins with a methodology section, followed by a brief history of the development of this new line of research. Following these two sections is one covering the relevant terminology and conceptions employed within the molecular anthropological field. Penultimately, two sections discussing the major findings of the field are offered, the first focusing on mtDNA research, and the second discussing Y chromosome research. Finally the report ends with a summary and conclusion. An appendix is also included, abstracting every study that could be located, as well as giving the title of the study, authors, citation, funding sources for the study, and genetic materials used.

B. METHODS

This study was conducted using the standard methods of a systematic analysis. Systematic analyses provide a rational synthesis of the research base and offer clear advantages to decision-makers. They attempt to overcome the deficiencies of narrative reviews and polemics by applying rigorous standards. Good systematic reviews take great care to find all the relevant studies (published and unpublished), assess each study for the quality of its design and execution, and combine the findings from individual studies in an unbiased manner. In this way they aim to present a balanced and impartial summary of the existing research evidence.

As a result, a comprehensive, in depth literature review was conducted using the following keywords: American Indians, American Indians, mtDNA, Y chromosome(s), genetics, genes, migration, DNA, haplogroup, haplotype, affiliation, North America, and Americas. These keywords were used to search the following indexes and databases: Web of Science (ISI); ABI Inform (OCLC); OCLC First Search; SilverPlatter; EBSCO; Cambridge Scientific Abstracts; National Library of Medicine; University of Colorado Catalogue; ScienceDirect; Anthropology Plus; Annual Review of Anthropology; Social Sciences Abstracts; Biological Sciences; and MedLine.

Both single word searches, as well as combination and Boolean word searchers, were conducted. Once all studies were located using the above search criteria, bibliographies of each study were cross-referenced to see if any studies had been missed. Furthermore, a general search was conducted on the World Wide Web using the following search engines: google, yahoo, altavista, lycos, and msn.

Once all studies were located, they were compiled, abstracted, and search for information concerning funding sources and genetic lines used in the study. All of the studies found are listed in alphabetical order in the Appendix.

C. MOLECULAR ANTHROPOLOGY: A REVIEW OF THE FIELD

The first attempt at reconstructing historical population movements (also referred to in the literature as demographic history, prehistoric or ancient migrations, prehistoric or ancient population affiliations, or phylogenetic relationships) on the basis of “classical” genetic data based on samples taken from living populations was undertaken in 1963 by Cavalli-Sforza and Edwards in a pioneering paper entitled Analysis of human evolution (published in 1965). Classical genetic data uses proteins and blood groups, as opposed to molecular genetic studies that use mtDNA and Y chromosome data. These same authors went on to compile their magisterial The History and Geography of Human Genes (Cavalli-Sforza, Menozzi, & Piazza, 1994), which relied primarily upon classical genetic markers sampled on a worldwide basis. This volume has been taken as a marker of the end of the “first phase” in the development of molecular anthropology, at a time when the use of such classical genetic markers was replaced by DNA studies.

The “second phase,” which this report is concerned about, is currently in full spate, and was initiated by the earliest papers utilizing DNA sequencing for the reconstruction of human population histories. One of the first of these, entitled Evolutionary relationships of human populations from an analysis of nuclear DNA polymorphisms (Wainscoat et al.,

  1. used nuclear DNA. The important paper by Cann, Stoneking, and Wilson (1987) entitled Mitochondrial DNA and human evolution , was one of the first to utilize the potential for studying specific lineages offered by mitochondrial DNA for the female line. These studies focused on the larger questions of human evolution, and not necessarily the demographic history of North American Indians.

Recent studies, however, have focused extensively on North American Indian demographic histories using both mtDNA and Y chromosome data. These studies have

We all carry deoxyribonucleic acid (DNA) in every cell of our bodies, which has been passed down almost unchanged from our earliest ancestors. DNA is the messenger of heredity. A simple metaphor to help explain much of the terminology used in molecular anthropology is that of an instruction manual. We can view DNA as a set of written instructions on how to build a human with the chromosomes acting as volumes of the manual. Not surprisingly, these instructions are immensely complicated and nowhere near fully understood. Nonetheless, the language of the instructions is very straightforward. Like many languages, the meaning is contained within a sequence of symbols or letters, of which the genetic language contains four symbols. These four symbols are the simple organic chemicals adenine, cytidine, guanine, and thymidine, always referred to as A, C, G, and T. These four chemicals, the nucleotide bases, are joined together one after another in a long molecular chain that forms DNA. In fact, the DNA molecule consists of two strands, the famous double-helix, each one containing the same information in its sequence of bases but in a complementary way. Therefore, when A appears in one strand, it is always opposite a T in the other. G and C are similarly matched.

When cells divide, DNA must be copied so that each daughter cell receives a full set of instructions. This is accomplished by unwinding the double-helix, and using each single strand as a template, to make two new identical double-helices. Because of the complementarity of the bases, the sequence remains intact. The copying mechanism is remarkably exact, but there are occasional mistakes, called mutations. It is these mutations, introduced randomly, that molecular anthropologists look for to compare.

Molecular anthropology doesn’t compare the blood from one individual to that of another, but instead compares polymorphic genetic frequencies to those of others. A polymorphism is the condition that within a population there exist differences in the population genetic structure, based on mutations. This implies the presence of two or more alleles – actual alternative variants that are similar but not identical – located at a particular position or locus on a chromosome. The human genome – the collective name for all DNA in each cell – is organized into what are called chromosomes, separate “volumes” of the human genome that reside in the cell nucleus. All of the chromosomes contain the three billion symbols that make up the human genome. There are twenty-four different chromosomes in the human genome, twenty-two from the parents (eleven from the mother, eleven from the father). These twenty-two chromosomes are collectively known as the autosomes, which distinguish them from the X and the Y sex chromosomes (chromosomes 23 and 24). Females have a pair of X chromosomes while males have both an X and a Y chromosome.

The classic biochemical approach for investigating historic population movements consists in taking samples, usually blood samples, from a well-defined human population and testing these to determine the presence or absence of alleles for the given polymorphisms under investigation. The number of individuals within the given sample of the population who possess a particular allele is then expressed as a gene frequency. These molecular frequencies are called haplotypes, and several haplotypes (i.e., gene frequency varieties) make up a haplogroup.

By documenting the various mutations found in a population (the population’s molecular frequency for a particular allele) and comparing these molecular frequencies to those of other populations, molecular anthropologists can begin to reconstruct the historic population movements of the two populations under study. This is usually done using theories modeling coalescent times and divergence times, which will be explained in further detail below.

The human genome contains one other piece of DNA, which is contained not in the nucleus but in small particles in the cell cytoplasm, called mitochondrial DNA (mtDNA). It is much smaller than the nuclear genome with just over 16,500 bases compared with the 3000 million bases found in the nuclear DNA. MtDNAs peculiar genetic characteristics, however, have made it a central component of molecular anthropological research.

As discussed above, there are several elements that make mtDNA particularly useful in the study of historic population movements. The first is that unlike nuclear DNA it is inherited from only one parent – the mother. This is because human eggs have a large cytoplasm full of mitochondria while human sperm contains only a few, and those either do not get into the fertilized egg or are eliminated shortly afterwards. This has two major implications. First, this means that all people inherit all of their mtDNA from their mother, who inherited her mtDNA from her mother, and so forth. Therefore, at any time in the past only one woman was an individual’s maternal and hence mtDNA ancestor. The other important implication is that mtDNA does not undergo genetic recombination. Recombination is the device used by chromosomes to shuffle their genes at every generation, which has the evolutionary advantage that new, favorable gene combinations occasionally emerge. These two features have proved useful because it has allowed molecular anthropologists the ability to track the rare mutations that arise between generations of the mtDNA, thus allowing them to document mtDNA allele frequencies of particular maternal lines. It is these allele frequencies that can then be compared to other allele frequencies to calculate when the two maternal lines historically diverged.

The Y chromosome also shares these two features of mtDNA, namely uniparental inheritance and a lack of recombination. Molecular anthropologists use variations that arise through mutations in both mtDNA and Y chromosome data to trace populations. Most mutations arise during the DNA-copying process prior to cell division. The simplest type of mutation, known as a point mutation, is the replacement of one base (A, C, G, or T) by another. This always happens in one individual cell in one individual person. To be passed on to the next generation the mutations must occur in the so-called germ line cells that are the precursors of either eggs or sperm. All sorts of mutations occur in other body cells, but these are irrelevant to the study of historic population movements, because they do not get passed on to the next generation. Furthermore, the mutations will have to increase to be noticed at all. If the new mutation does not alter the biological fitness of the individuals carrying it, in other words if it is a neutral change, then the process by which it spreads, or is eliminated, is governed purely by chance and referred to as genetic drift. Therefore, taking the Y chromosome as an example, suppose

frequently with a complex internal structure, are found more rarely, but when they are, such as MSY1 located on the Y chromosome, they can also be useful. The study of these regions on the mtDNA and Y chromosome and their allele frequencies are called haplotypes. Many haplotypes make up a haplogroup, and it is this larger entity that is used to compare one population with another.

1. H APLOTYPES AND H APLOGROUPS

Although researchers have noted that limitations exist when studying only one gene (Chen et al., 2000; Karafet, Zegura, Vuturo-Brady, Posukh, Osipova, Wiebe, Romero, Long, Harihara, Jin, Dashnyam, Gerelsaikhan, Keiichi, & Hammer, 1997; Mountain & Cavalli-Sforza, 1997), most studies still rely on only one gene and its alleles because of the ease in identifying differences in a restricted location on that gene , especially in non- recombining genes such as mtDNA and Y chromosome. The allele sequences that are studied are called haplotypes, which for American Indians presently fall into five larger recognized haplogroups (A, B, C, D, and X), and have been used in most studies concerning American Indian population genetics.

It is necessary to point out several assumptions underlying the uses of haplotypes and haplogroups. First, many studies use within-local-population frequencies for the genetic sequences, which are highly affected by each population’s specific recent demographic history, and the possibility exists that researchers will underestimate the nucleotide diversity of the population as a whole (Bonatto & Salzano, 1997a). Therefore, the differing results between CR (control region) sequences and RFLP (restriction fragment length polymorphism) sequence data cannot be explained either by sample size or attributed to the different ways in which the haplogroup frequencies were treated, but are more probably due to the different populations or regions of the DNA studied.

Likewise, as previously noted, the only changes introduced into genes are either point mutations, insertions, and deletions (with insertions and deletions being rare in comparison to point mutations). This means that each possible founding lineage cluster can be thought of as containing the founding lineage haplotype plus a collection of that lineage’s descendants. However, there are several problems inherent in this assumption, notably that the original Y chromosome can eventually die out, shifting time, haplotype frequency, or relationships of the population under study (Bradman & Thomas, 1998), and can result in faulty data when comparing a present population’s frequencies to those of an ancient population’s haplotype frequencies. As Bradman and Thomas (1998) pointed out using the insertion of the YAP (Y chromosome alu polymorphism) indel (insert) on the Y chromosome, descendents of individuals after only one generation may not carry the same Y chromosome alleles. It is possible that a descendent of the individual who first acquired the YAP indel may lose that indel, yet still remain a descendent of that individual. This is also possible with mtDNA, where a father’s son or daughter will not carry the genetic information of that person’s father’s mother. By only looking at specific alleles, mutations, insertions, and deletions can be viewed as coming from discontinuous populations. Furthermore, as Bianchi et al have pointed out, “the combination of a decrease in the effective population size and genetic hitch-hiking may

have been the cause producing a single variety of Y-chromosomes in the earliest ancestors of extant Amerindians,” (1997, p. 87). If this is correct, then spurious results may arise when determining biological affiliation between populations. Finally, as noted, the mitochondrial genome undergoes no recombination, and therefore the 16,569-bp genome behaves evolutionarily as a single locus. As MacEachern (2000, p. 358) noted, “In particular, it appears that there may be significant variability in selection mechanisms on the genome itself and in the mitochondria and in rates of phylogenetic versus intergenerational mtDNA mutation that are only now being appreciated (Gibbons 1998; Parsons, Muniec, and Sullivan 1997).” Therefore, inferences from any one such locus lack robustness (Pamilo & Nei, 1998).

2. COALESCENT TREES AND G ENE TREES

Once the haplotypes of a population have been determined, a unique gene tree can be constructed from the configuration of mutations under the assumption that point mutations arise at sites only once in time, without any back or parallel mutation. The coalescent tree is hypothetically a perfect phylogeny representing the mutation history of that haplotype back in time. The coalescent tree is equivalent to the DNA sequence data, and because it hypothetically represents the ancestry of the population, it is common to think of the DNA sequence data as a phylogenetic tree. It is important to remember, however, that the data is not independent, however, because of the relationship through shared ancestry. The likelihood of a coalescent tree under a stochastic coalescent model of evolution can be found by advanced simulation techniques, thus allowing a maximum likelihood estimation of the parameters using the full information in the data. The distribution of the time to the most recent common ancestor and ages of mutations in the tree, conditional on its typology, can also be found by simulation techniques. Computing likelihoods by computer intensive methods for samples of DNA sequences under general models is currently a very active research area. Some of the approaches used are Importance Sampling (Bahlo & Griffiths, 2000; Fearnhead & Donnelly, 2001; Griffiths & Marjoram, 1996; Griffiths & Tavare, 1994a; 1994b; 1994c; Nielsen, 1997; Slade, 2000a; 2000b); Markov Chain Monte Carlo (MCMC) by Felsenstein and colleagues (1995; 1998; 1997); and other approaches, such as MCMC of a Bayesian nature by Wilson and Balding (1998), Beaumont (1999), and Markovtsova et al. (2000a; 2000b).

One important quantity in classical population genetics is the inbreeding coefficient, a measure of mating between relatives, that is defined as the probability that a pair of genes at a locus are identical by descent. A pair of genes are considered identical by descent if both are derived from the same gene in a common ancestor (Crow & Kimura, 1970), and Gustave Malecót (1955) was one of the first who clearly distinguished this concept from identity-in-state. The inbreeding coefficient is intimately related to the effective number of individuals in a population that is the size of an ideally behaving population that would have the same decrease in heterozygosity as the observed population. The effective number or effective population size is used if there are fluctuations in the population number from time to time, or if the distribution of number of progeny per parent is nonbinomial, or if there is any other kind of deviation from the idealized model that has been assumed (Crow & Kimura, 1970). Therefore, in a hypothetically random mating

researchers are not sure whether nucleotide sites undergo a constant mutation or substitution rate.

Researchers, because of this lack of assurance, have concluded that if intragenic recombination is frequent it will lead to erroneous estimates of ancestral polymorphism. Two incompatible requirements, therefore, arise. To infer accurate gene genealogy, researchers must look at long stretches of DNA in which a sufficiently large number of nucleotide differences can be observed. On the other hand, such long stretches are likely to undergo intragenic recombination, resulting in faulty genealogies.

Likewise, it is essential that the researcher, in using genetic frequencies and coalescent times, does not assume that these are the same as the times of origin for the population under study or when one population split from another (i.e., biological affiliation). Although tracing the genealogy of mtDNA or Y chromosome allele frequencies theoretically can lead to a single common ancestor, this is not evidence that the populations under study went through a period when only one ancestral breeding population was alive and reproducing. Tracing the coalescent times leads to one ancestor of a unilineally transmitted set of markers (either through the maternal or paternal line), but the descendents of the original DNA will have had haplotype frequencies that differed among that of the entire population, resulting in a biased sample of the total historic population’s frequencies when using coalescent times. This is because working back in time does not allow one to take into account the various branches of diversity that the historic population had, but only the lineal history of the specific marker being coalesced. Three primary assumptions arising from the use of coalescent times (Hoelzer, Wallman, & Melnick, 1998; Hudson, 1990; Templeton, 1993; 1998; 2002) that have been employed specifically in understanding American Indian historic population movements are:

  1. gene coalescence is a regular process of mutation accumulation in neutral systems, and therefore can be timed like a regularly ticking clock with an acceptable range of error;
  2. American Indian populations were isolated from each other after they originated or migrated to the Americas; and
  3. the history of particular gene systems is the history of the specific populations in which they are found.

However, as already mentioned, human populations are not neutral systems, and it is not clear if it is safe to assume that mutations occur in a regular, timely fashion. Furthermore, as much of the American Indian ethnographic, linguistic, and archaeological data demonstrates, American Indian populations were never isolated, either from each other or possibly from ancestral populations in Asia (for discussions on the latter aspect, see Akazawa, 1999; Anderson & Gillam, 2001; Bever, 2001; Ikawa- Smith, 1982; Tarazona-Santos & Santos, 2002).

One important requirement in coalescence theory is the use of random samples of genes from the population under study. However, this is extremely difficult to accomplish, not to mention when studying historical relationships between ancient populations and their

possible descendents. As Donnelly and Tavare (1995) point out,

In practice, genetic data are typically obtained from convenience samples rather than proper random samples. There is an obvious danger that such data may contain individuals who share relatively too much ancestry on the relevant timescales. The extent to which application of coalescent (or traditional) methods to such convenience samples may be misleading remains an open, and potentially serious, question. (p. 418)

Furthermore, most studies that research American Indian historic population movements rely on the idea that American Indians came to the Americas from Asia in small groups (usually thought to have occurred as part of one to three migration waves) across the Bering Land Bridge in prehistoric times. If this is the case, coalescence times will be shorter than actual population divergence times because smaller populations in the past are more likely to share ancestors (Donnelly & Tavare, 1995, p. 410), leading to an accelerated time of origin for American Indians, and thus not correctly representing occupational time depth or biological affiliation.

Similarly, departures from random mating due to inbreeding, assortative mating, or population stratification can lead to non-random association between genotypes and further complicate the interpretation of the data and coalescent times. As Karafet et al. (1997) concluded, because of the presence of the 1T haplotype (a Y chromosome combination haplotype) in both northeastern Siberia and the Americas, the possibility of historic and prehistoric back-migration is extremely likely. Similar studies have also noted the possibility of gene transfer or the “hitch-hiking theory” among American Indian and Asian populations (Bianchi, Bailliet, Bravi, Carnese, Rothhammer, Martinez- Marignac, & Pena, 1997; Bradman & Thomas, 1998; Hudson, 1990). Because population-coalescence times are frequently a result of the fusion of several of the ancient phylogenetic clusters and not necessarily the age of individual populations (Watson, Forster, Richards, & Bandelt, 1997), faulty results may be reported. Therefore, using gene coalescent times as possible times of origin for American Indians can lead to spurious conclusions, for there is no evidence that American Indians were ever: 1) part of a neutral system that can be timed like a regularly clicking clock, 2) were isolated from each other or from Asian populations, and 3) that the current gene systems found in a particular population fully represent the historical diversity of that population.

The mtDNA and Y chromosome sections of the human genome have proven to be the most useful for studying historical population movements because of their ease in replication and amplification, as well as the fact that they are non-recombining. However, the larger theoretical assumptions underlying how molecular anthropologists reconstruct particular population allele frequencies is still nascent. In 1985, Jeffreys et al. (1985) introduced individual-specific “fingerprints” for multiple loci, which were later applied to single-locus variable numbers of tandem repeat (VNTR) polymorphisms and short tandem repeats (STRs). In parallel with these molecular advances, which made DNA typing more sensitive and reliable, the mathematical theory became more precise. In rapid succession three main obstacles were overcome: population structure, kinship,

are the most precise. For example, computer simulations that suggest that the four major haplogroups found among American Indians underwent a bottleneck followed by a large population expansion may be questioned. These simulations are based primarily on the analysis of CR sequences from haplogroup A and do not take into account haplogroups B, C, D, and X. Similarly, although most studies investigating human population movements have used sequence diversity as a measure of age, few have investigated whether their samples met the very stringent assumptions required by this practice (Bonatto & Salzano, 1997a, p. 1417). Furthermore, Bonatto and Salzano (1997) have also noted that studies using RFLPs have found that haplogroup B has a much lower diversity than the other four (A, C, D, X), which would lead to inaccurate computer simulations. Therefore, for example, the current dates from mtDNA and Y chromosome studies contending that American Indians arrived in the “New World” around 35, years ago can be questioned (Bonatto & Salzano, 1997a; Bonatto & Salzano, 1997b; Brown, Hosseini, Torroni, Bandelt, Allen, Schurr, Scozzari, Cruciani, & Wallace, 1998). This number is actually the time during which American Indians theoretically experienced an expansion after a bottleneck. However, it is unknown if this bottleneck took place in Asia, the generally accepted origin of American Indians, or in the Americas after their arrival, nor is it known what effects subsequent migrations and bottlenecks from disease and other factors have on this time estimation.

Adequate sample sizes are also critical if the genetic frequencies used to characterize a population are to be considered reliable. Typically in studies addressing American Indian historic population movements, sample sizes range between four and 30 individuals per tribal population; this is insufficient to detect little more than the most common haplotypes in each population. Although it is necessary to have genetic samples from 50 males or 50 females of an individual population to accurately infer genetic demographic history, very few studies have done this. The largest study to date on American Indians dealt with 2,198 males from 60 global populations, including 20 American Indian groups (Karafet et al. 1999; this study relied on large amounts of data gathered from previously published reports, and thus could not correct for those sample sizes). However, only the Inuit Eskimo and Navajo samples were over 50 at 62 males and 56 males respectively. All others ranged from as high as 44 to as low as two individuals. It is unrealistic to assume that one can get an accurate picture of a tribe’s genetic frequencies using only two males. In fact, Weiss (1994, p. 834) suggests that we may not be able to distinguish loss of lineages after one migration or separate migrations from a common source population, thus further stressing the critical need for adequate population sample sizes. As Ward et al. (1993) have noted, a sample size of 25 will detect ~63 percent of the lineages in a tribe with normal diversity. In tribes with extensive diversity a sample size of 25 individuals will only detect ~40 percent of the lineages and sample sizes of 70 or above are required to detect two-thirds of the lineages. The fact that the majority of studies lack the required sample sizes necessary to detect even 63 percent of the lineages in a normally diverse tribe brings into question many of the results of these studies, especially when it has been noted that most American Indian tribes are believed to have a high level of diversity (Ward, Alan Redd, Valencia, Frazier, & Paabo, 1993).

As has been discussed, prehistoric migrations are difficult to reconstruct from mtDNA and Y chromosome data. The most meaningful measure of migration from a genetic point of view is obtained by taking the generation as the time unit. Measuring the distribution between birthplaces of parent and offspring theoretically can yield a statistical measure of migration. However, this method works only for a continuous model in which the population is constant, and is not entirely satisfactory when the population is highly clustered as is believed most prehistoric populations were (Cavalli- Sforza & Bodmer, 1971, p. 433). A similar limitation in using such data to infer migrations is that exchange between non-neighboring clusters may have been frequent enough among prehistoric populations to violate the rules of the simplest stepping-stone models (Cavalli-Sforza and Bodmer 1971).

A final aspect of human DNA confounding many of the current uses of this data to reconstruct human population movements is that human mtDNA variation is high. Likewise, genetic variation within populations is much greater than between populations (Walpoff, 1999, p. 551). What this means is that mtDNA evolution, and possibly the evolution of other genetic systems, is not the same as the evolution of particular populations. As Scozzari et al. (1999) have noted, groups or tribes thought to have descended from a common ancestor more than 10,000 years ago may have lost even their shared-by-descent portion of their gene pool and can no longer be detected as affiliated through genetic analysis. Likewise, population specific mutations and the gene trees inferred from these sequences are generally inconsistent with historic and prehistoric population affiliation. Page and Charleston (1990) have identified a method for visualizing and quantifying the relationship between a pair of gene and species trees that constructs a third, reconciled tree. Reconciled trees use a more critically optimal method for mapping the combined history of genes and populations. However, even this more accurate method of depicting gene and population trees has limitations such as allele phylogenies and horizontal transfer, neither of which has been addressed in studies concerning American Indian historic population movements. In fact, many of the polymorphisms observed for mtDNA probably predate population separations (Mountain & Cavalli-Sforza, 1997) and would not be useful in constructing genetic, population, or reconciled trees. Mitochondrial DNA or Y chromosome lineages are not human populations. In order to estimate the significance of variation of gene frequencies between groups, it is necessary to estimate how large a sample must be in order to be representative of the group. This can only be accomplished if an accurate estimate of the real variation to be expected in the gene frequencies is possible. This estimation is valid only for genes without dominance, in which case genes can be counted. However, if people in the sample from a given tribal village or town are closely related, a single source of variation may greatly inflate the estimate of variance between populations (Cavalli-Sforza and Bodmer 1971, p. 422). Multivariate analysis, or the use of more than one trait or gene, which is presently the most commonly employed method of analysis, poses more difficult problems in that one must determine the maximum number of genes possible for each population in order to be accurate. Unfortunately, many authors have tested only a small set of markers on one gene (univariate) for their studies (Cavalli- Sforza, Menozzi, & Piazza, 1994, p. 22), combining their data with those of others to arrive at several sets of markers for their multivariate analysis.

GCN (GC 1A3), to a list of markers that indicate an Asian connection. Although the Diego DIA gene is not always present in all American Indian groups, when it is detected DIA occurs only in American Indians or Asians. The frequency of the immunoglobulin haplotype GMA T in Asian populations reaches 50 percent in central Mongolia but is at a lower frequency in North American Indian groups. Similarly, GMA G is found at frequencies varying between 86 percent in the Chukchi of Siberia (Schanfield, Crawford, Dossetor, & Gershowitz, 1990) to 56 percent among the Ainu of Japan (Matsumoto & Miyazaki, 1972). In North American Indian populations, this GM marker varies from 98 percent among the Northern Cree to 47 percent in a mixed Alaskan group (Schanfield, Crawford, Dossetor, & Gershowitz, 1990). Less is known about the geographic distribution of the complement B2 allele and Factor 13B3; however, preliminary analyses suggests that these alleles occur at higher frequencies in both the American Indian and Asian groups.

In many of the other genetic systems, e.g. the human leukocyte antigen (HLA) system, the various blood groups, and even the mitochondrial DNA (mtDNA) Asian haplotypes, most of the forms occur in some other populations of the world, but at different frequencies. American Indians share the four major haplogroups (A-D) with Asian populations (Torroni & Wallace, 1995). In addition, Siberian and American Indian populations share two identical mitochondrial DNA haplotypes, namely S26 (AM43) and S13 (AM88). The S and AM designations represent the same haplotypes, defined by the presence or absence of the specific restriction sites, in Siberian and American Indian populations. From these two haplotypes, Torroni et al. (1993) attempted to reconstruct the time of divergence of the Asian and American Indian mtDNA variation. These differences in the frequencies of some of the genetic markers led these researchers to conclude that American Indian populations are the result of small founding groups, unique historical events, and possibly the action of natural selection over a span of 15, to possibly 40,000 years.

William Boyd (1952) was one of the first to use classical genetic data to compare American Indians to other world populations. He believed that American Indians as a whole were distinct from other major continental populations in their blood-group frequencies. He proposed a single American Indian serological grouping, one of seven such major groupings. In the volume Biomedical Challenges Presented by the American Indian (1968) Miguel Layrisse summarized the distinctive patterns of frequencies in American Indian populations (see Table 1).

Table 1

High-incidence markers Low-incidence or absent markers

ABOO ABOA

MNM ABOB

RHR1 RHRO

FYA LUA

DIA KK

ABH*SE LEA+

Abnormal hemoglobin

Source: After Layrisse (1968)

During the past two decades, innovations in biochemical genetics and serology have produced a plethora of genetic markers that can be utilized to evaluate populational affinities and movements. Since Layrisse’s compilation of 1968, a number of new genetic markers have been identified through electrophoresis, isoelectric focusing and immunologic techniques. Two of the most informative sets of genetic markers encode the immunoglobulins (GMs and KMs) and the human leukocyte antigens (HLA system). The newer markers that distinguish American Indian populations are listed in Table 2.

Table 2

High-incidence Low-incidence or absences

HLAA2, HLAA9, HLAW28, HLAA10 HLAA1, HLAA3, HLA*A

HLABW15, HLABW16, HLABW40 HLAB29, HLA*B

GMA G, GMA T GMF B, GMA,F B

GC1S, GCIGLOOLIK BF*F

GC*CHIP

ALBMEX, ALBNASK

TFDCHI, TFBO-

CHE1S, CHE2+

According to Lampl and Blumberg (1979), the HLAA2 allele has the highest incidence among American Indians. In addition, HLAA9, HLAW28, and HLAW31 are common alleles. Bodmer and Bodmer (1973) note the absence of HLAA1, HLAA3, HLAA10, HLAA11, HLAW29 from American Indian populations. Furthermore, North and South American Indian populations can be distinguished on the bases of HLAAW31 and HLAW15, which occur at high frequencies among South American Indian groups, whereas HLAW28, HLAA9, and HLAW5 are North and Central American markers.

existence of another founding lineage. However, Schurr et al. (1990) explained these results in terms of the loss of this founding haplotype from the three American Indian tribes they had studied.

A more extensive analysis by PCR amplification with 14 endonucleases of 321 American Indians from 17 populations confirmed the presence of the four different lineages or haplogroups A, B, C, and D that account for 96.9 percent of American Indian mtDNA variation and are of Asian ancestry (Torroni, Schurr et al., 1993; Torroni & Wallace, 1995). These findings supported the hypothesis that the four American Indian mtDNA haplogroups resulted from four separate demic expansions. Three of the four haplogroups (A, C, and D) observed in the Americas are present in indigenous Siberian populations (Torroni, Sukernik, Schurr, Starikorskaya, Cabell, Crawford, Comuzzie, & Wallace, 1993). Presently, none of the Siberian populations have exhibited haplogroup B. The presence of haplogroup B in Asia and the Americas, and its absence from Siberia, is suggestive of its separate expansion into the Americas, possibly prior to the peopling of Siberia (circa 20,000 years ago). When dates of divergence were calculated using the mtDNAs of American Indians and Siberians, they fell between 17,000 and 34,000 ybp (Torroni, Sukernik, Schurr, Starikorskaya, Cabell, Crawford, Comuzzie, & Wallace, 1993).

Torroni and Wallace (1995) reported that out of 743 American Indians tested to date, 25 individuals, scattered among eight tribal groups of North, Middle, and South America, displayed some mtDNA variants that differed from the four common haplogroups (A, B, C, D). They suggested that these variants may be the result of: 1) a second mutational event; 2) possible admixture with Europeans or Africans; or 3) additional Asian haplogroups brought into the New World by Siberians. Thus, Torroni and Wallace cautioned researchers against classifying mtDNAs from Old World populations by using only the primary variants found in American Indian mtDNA. They went on to point out that the 9-bp deletion had occurred independently in different regions of the world. This conclusion was supported by Soodyall et al. (1996), who discovered the presence of the so-called Asian-specific deletion in sub-Saharan Africa. From these data it appears that this 9-bp deletion arose independently at least twice, once in Asia and once in Africa. Furthermore, Bailliet et al. (1994) have proposed the existence of as many as ten possible mtDNA founder haplotypes in American Indian populations. However, others believe that some of these haplotypes may be due to mutations in American Indian populations and/or admixture with Europeans (Bianchi & Rothhammer, 1995).

Ward and his colleagues (1991) sequenced a 360-nucleotide segment of the mtDNA control region from 63 individuals of the Nuu-Chah-Nulth (or Nootka) from Vancouver Island. They identified 28 mtDNA lineages as defined by 26 variable positions within the control region. Ward et al. (1991) computed the average sequence divergence between the lineage clusters using a maximum rate of evolution of 33 percent divergence per million years for the control region. They obtained a range of 41,000-78,000 years, with an average of 60,000 years. These data suggest that the mitochondrial lineages within a single American Indian tribe diverged approximately 60,000 years ago. They interpreted these data as evidence that the lineages were established prior to the American

Indian entry into the Americas, and they concluded that the founding populations of American Indians contained considerable genetic diversity.

Merriwether et al. (1995) extensively investigated the geographic distribution of the four founding mtDNA lineage haplogroups in American Indian populations. They observed a north-south increase in the frequency of haplogroup B, accompanied by a north-south decrease in the frequency of haplogroup A. Based upon the extensive distribution of the four lineages in the Americas, Merriwether and his colleagues concluded that the pattern is consistent with a single migratory wave from Siberia into the Americas, followed by genetic divergence. However, these data can also be interpreted to represent a number of migrations from Siberia reintroducing the same haplogroups.

After this study, Merriwether et al. (1996) went on to compare mtDNA RFLPs from Mongolians of Ulan Bator with an array of frequencies of the founding lineage haplogroups in American, Asian, and Siberian populations, revealing considerable similarity between the Mongolian and American populations (Merriwether, Hell, Vahlne, & Ferrell, 1996). In this study the haplogroups were further subtyped into A1, A2, B1, B2, C1, C2, D1, D2, X6, X7, and “others.” Unlike the Northeastern Siberian populations, this Mongolian sample exhibited all four of the American primary haplogroups and shared the highest number of haplotypes with American Indian populations. As mentioned, haplogroup B has not been detected in any of the Siberian populations in closest proximity to the Bering Strait. However, this haplogroup occurs at a frequency of 75 percent among the Atacameno and 50 percent among the Pima, but is absent in a number of other American Indian groups (e.g., Makiritare, Dogrib, and Haida).

The vast majority of mtDNAs from modern American Indian populations belong to primarily five different haplogroups, which have been designated A–D and X (Brown, Hosseini, Torroni, Bandelt, Allen, Schurr, Scozzari, Cruciani, & Wallace, 1998; Forster, Harding, Torroni, & Bandelt, 1996; Schurr, Ballinger, Gan, Hodge, Weiss, & Wallace, 1990; Torroni, 1994; Torroni, Chen et al., 1993; Torroni et al., 1994; Torroni, Schurr, Cabell, Brown, Neel, Larsen, Smith, Vullo, & Wallace, 1993; Torroni et al., 1992; Torroni, Sukernik, Schurr, Starikorskaya, Cabell, Crawford, Comuzzie, & Wallace, 1993). Each of these is distinguished by a unique combination of coding region RFLPs and HVR-I sequence polymorphisms. Together, they comprise 95–100 percent of all mtDNAs in indigenous populations of the Americas (Schurr & Sherry, 2004; Schurr & Wallace, 2002). The same pattern of variation is also observed in ancient Amerindian samples (Carlyle, Parr, Hayes, & O'Rourke, 2000; Fox, 1996; Kaestle, 1995; 1997; 1998; Kaestle & Smith, 2001a; Lalueza, PerezPerez, Prats, Cornudella, & Turbon, 1997; Merriwether, Rothhammer, & Ferrell, 1994; Monsalve, Edin, & Devine, 1998; O'Rourke, Hayes, & Carlyle, 2000b; Parr, Carlyle, & ORourke, 1996; Ribeiro-dos-Santos, Guerreiro, Santos, & Zago, 2001; Ribeiro-dos-Santos, Santos, Machado, Guapindaia, & Zago, 1997; Stone & Stoneking, 1998). Therefore, these five haplogroups are clearly the main founding mtDNA lineages in American Indian populations. However, a certain number of haplotypes not belonging to these five maternal lineages have been detected in different American Indian groups (Bailliet, Rothhammer, Carnese, Bravi, & Bianchi,