0 оценок0% нашли этот документ полезным (0 голосов)
80 просмотров19 страниц
Q-M378 subclade, which is downstream of Q-L275 haplogroup, is marked by a wide area of its distribution
and a minor share of presence in modern populations of Eurasia. Phylogenetic structure of the subclade, known
so far, did not allow for matching SNP Y-chromosomes to specific populations and to reconstruct possible direc-
tions of their migrations in retrospect.
The conducted research enabled us to form a consistent phylogenetic structure of Q-M378 subclade, validated
by analysis of SNP and STR-markers, based on the data of full Y-chromosome sequencing using next generation
sequencers. As part of the research, new phylogenetic levels of Q-Y2250 (downstream of Q-M378 and including
Q-L301), Q-Y2220 (downstream of Q-L245), Q-Y2200 (downstream of Q-Y2220) were defined.
SNPs, which, in the future, may possibly mark certain European and Asian subclusters of Q-Y2220 (including
the Armenian subcluster), as well as separate branches of the Jewish cluster Q-Y2200, were defined as well.
The research also confirmed connection of Q-M378 subclade distribution with migration of Indo-European
language carriers from Central Asia via Afghanistan and Iran to the West.
Оригинальное название
Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
Q-M378 subclade, which is downstream of Q-L275 haplogroup, is marked by a wide area of its distribution
and a minor share of presence in modern populations of Eurasia. Phylogenetic structure of the subclade, known
so far, did not allow for matching SNP Y-chromosomes to specific populations and to reconstruct possible direc-
tions of their migrations in retrospect.
The conducted research enabled us to form a consistent phylogenetic structure of Q-M378 subclade, validated
by analysis of SNP and STR-markers, based on the data of full Y-chromosome sequencing using next generation
sequencers. As part of the research, new phylogenetic levels of Q-Y2250 (downstream of Q-M378 and including
Q-L301), Q-Y2220 (downstream of Q-L245), Q-Y2200 (downstream of Q-Y2220) were defined.
SNPs, which, in the future, may possibly mark certain European and Asian subclusters of Q-Y2220 (including
the Armenian subcluster), as well as separate branches of the Jewish cluster Q-Y2200, were defined as well.
The research also confirmed connection of Q-M378 subclade distribution with migration of Indo-European
language carriers from Central Asia via Afghanistan and Iran to the West.
Q-M378 subclade, which is downstream of Q-L275 haplogroup, is marked by a wide area of its distribution
and a minor share of presence in modern populations of Eurasia. Phylogenetic structure of the subclade, known
so far, did not allow for matching SNP Y-chromosomes to specific populations and to reconstruct possible direc-
tions of their migrations in retrospect.
The conducted research enabled us to form a consistent phylogenetic structure of Q-M378 subclade, validated
by analysis of SNP and STR-markers, based on the data of full Y-chromosome sequencing using next generation
sequencers. As part of the research, new phylogenetic levels of Q-Y2250 (downstream of Q-M378 and including
Q-L301), Q-Y2220 (downstream of Q-L245), Q-Y2200 (downstream of Q-Y2220) were defined.
SNPs, which, in the future, may possibly mark certain European and Asian subclusters of Q-Y2220 (including
the Armenian subcluster), as well as separate branches of the Jewish cluster Q-Y2200, were defined as well.
The research also confirmed connection of Q-M378 subclade distribution with migration of Indo-European
language carriers from Central Asia via Afghanistan and Iran to the West.
Received: December 14 2013; accepted: December 16 2013; published: January 8 2014 Correspondence: gurianov.vm@gmail.com napobo3@gmail.com acgt@yfull.com
Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
Vladimir Gurianov 1
Leon Kull 2
Roman Sychev 3
Vladimir Tagankin 3
Vadim Urasin 3
1 The Q-L275 Research Project, Russia,
2 Full Genomes Corporation, USA,
3 YFull research group, Russia.
Abstract
Q-M378 subclade, which is downstream of Q-L275 haplogroup, is marked by a wide area of its distribution and a minor share of presence in modern populations of Eurasia. Phylogenetic structure of the subclade, known so far, did not allow for matching SNP Y-chromosomes to specific populations and to reconstruct possible direc- tions of their migrations in retrospect. The conducted research enabled us to form a consistent phylogenetic structure of Q-M378 subclade, validated by analysis of SNP and STR-markers, based on the data of full Y-chromosome sequencing using next generation sequencers. As part of the research, new phylogenetic levels of Q-Y2250 (downstream of Q-M378 and including Q-L301), Q-Y2220 (downstream of Q-L245), Q-Y2200 (downstream of Q-Y2220) were defined. SNPs, which, in the future, may possibly mark certain European and Asian subclusters of Q-Y2220 (including the Armenian subcluster), as well as separate branches of the Jewish cluster Q-Y2200, were defined as well. The research also confirmed connection of Q-M378 subclade distribution with migration of Indo-European language carriers from Central Asia via Afghanistan and Iran to the West.
Introduction
The Q-M378 subclade 1 , downstream of Q- L275 haplogroup, is present in a number of pop- ulations in Europe, Southwest (Western) 2 and Southern Asia 3 , and also in the Central Asia all the way to North-West China 4 .
1 yDNA Haplogroup Q and its Subclades 2013 - http://www.isogg.org/tree/ISOGG_HapgrpQ.html. Hereinafter subclades are referenced in line with ISOGG notation (International Society of Genetic Genealogy) specifying single nucleotide polymorphism (SNP) typical for a respective subclade. 2 Cinnioglu et al, Excavating Y-chromosome haplotype strata in Anatolia, 2003. Haplotypes 337-339 according to predictor by Urasin (http://predictor.ydna.ru/) are positive to SNP M378. All samples belong to Central-Anatolian and East-Anatolian regions of Turkey. 3 Sanghamitra Sengupta et al., Polarity and Temporality of High-Resolution Y- Chromosome Distributions in India Identify Both Indigenous and Exogenous Ex- pansions and Reveal Minor Genetic Influence of Central Asian Pastoralists, Am J Hum Genet. 2006 February; 78(2): 202221. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1380230/ (among the tested inhabitants of Pakistan 2 out of 176 or 1.14% were positive to SNP M378; SNP M378 was not identified among sample groups in India and Eastern Asia). 4 Zhong et al., Extended Y-chromosome investigation suggests post-Glacial mi- grations of modern humans into East Asia via the northern route // Molecular Bi- ology and Evolution, First published online: September 13, 2010, doi: 10.1093/molbev/msq247 (among four populations of Uigurs from Xinjiang one such person was found in each of the two populations: 1 out of 71, 1 out of 18).
One of the peculiar features of Q-M378 sub- clade is a relatively wide area of its distribution (connected with migrations of ancestral popula- tions of the Indo-European language family) and an extremely low percentage in almost all popu- lations (modern ethnic groups), where it has been reported by now. The exception is the Jew- ish Diaspora (primarily Ashkenazi Jews), where Q-M378 subclade share reaches 5.2 to 7 percent (Behar 2004 5 , Hammer 2009 6 ). Therefore, Q- M378 locality is often associated with the Middle East. In the meantime, a more comprehensive analysis of research data and publicly available data of commercial tests enables us to draw a conclusion on more complex and rather unob-
5 Behar DM, Garrigan D, Kaplan ME, Mobasher Z, Rosengarten D, Karafet TM, Quintana-Murci L, Ost- rer H, Skorecki K, Hammer MF. (2004). "Contrasting patterns of Y chromosome variation in Ashkenazi Jewish and host non-Jewish European populations". Hum Genet 114 (4): 354365. doi:10.1007/s00439-003-1073-7. PMID 14740294 6 Hammer MF, Behar DM, Karafet TM, et al.(November 2009). "Extended Y chromosome haplotypes resolve multiple and unique lineages of the Jewish priesthood". Human Genetics 126 (5): 707717. doi:10.1007/s00439-009-0727- 5. PMC 2771134. PMID 19669163.
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
85
vious correlations between carriers of this Y- chromosome mutation for the last millennium.
The article's aim is to, based on the available data from open sources and conducted research data, specify phylogenetic structure of Q-M378 subclade and provide classification of its major clusters (haplotypes, combined according to the following criteria: pertaining to a sequence of a single SNP - single nucleotide polymorphisms, phylogenetic similarity, geographical distribu- tion).
Source data and methodology
Data sets for comparison
Data from the Personal Genome Project 7
and the 1000 Genomes Project 8 were used within the framework of the conducted research. Samples, taken from the specified projects (Ta- ble 1), have PGP and HG prefixes respectively.
7 http://www.personalgenomes.org/ See also: Ball, M.P., et al., A public resource facilitating clinical use of genomes. Proceedings of the National Academy of Sciences, 2012. 109(30): p. 11920-11927. 8 http://www.1000genomes.org/ See also: 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature, 2012. 491(7422): p. 56-65. Table 1. Information based on the data from The Personal Genome Project and 1000 Genomes Project.
Sample code Population Verified origin HG03914 Bengali (BEB) Bangladesh HG03652 Punjabi (PJL) Pakistan (Lahore) HG03864 Telugu (ITU) India PGP130 N/A Northern Africa (Morocco)
Samples HG03914, HG03652, HG03864 that do not belong to Q-M378 subclade were used for comparison.
Additionally, data from targeted Y- chromosome sequencing of five individuals, tested at Full Genomes Corporation (FGC) 9 , were analyzed.
9 https://www.fullgenomes.com/
Table 2. Information based on test participants' data at Full Genomes Corporation.
Sample code Population Verified origin AJ1 Ashkenazi Jews Eastern Europe AJ2 Ashkenazi Jews Eastern Europe Ar1 Armenians Eastern Turkey Ir1 Iranians Iran, Khuzestan province Kz1 Kazakhs Kazakhstan, kozha lineage
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
86
Genotyping
Data sets in BAM format (BAM/SAM Specifi- cation 10 ) and, in case of PGP130, TSV 11 format were used for the research.
Next-generation sequencing 12 , performed by Full Genomes Corporation at Beijing Ge- nomics Institute using Illumina HiSeq 2000 sequencer, is characterized by the following pa- rameters: 50x coverage at read length of 100 base pairs, with paired end reads. Mapped cov- erage at about 23 million base pairs out of ap- proximately 59 million base pairs, present in a human Y-chromosome, was obtained.
Data processing and analysis
Clusterization of Q-M378 subclade haplo- types (including haplotypes that belong to Q- L275 upstream level and downstream levels) was carried out based on 222 haplotypes processing (67 STR-markers 13 ), obtained from public sources 14 . MURKA software 15 was used to construct the phylogenetic tree.
Processing and analysis of full Y-chromosome sequencing data was made using FGC software, along with the software developed by YFull re- search group 16 .
Samples pertaining to Q-L275 subclade and having no M378 mutation were used as refer- ence, along with the samples of an upstream and parallel subclades on a case-by-case basis. Each sample was genotyped for both SNPs dis- covered during the research and SNPs included in the ISOGG list under Q-L275 subclade and its downstream subclades.
Presence of mutation in more than two sam- ples served as the criterion of a new SNP dis- covery, as well as data consistency between the new SNPs inter se and the previously known in-
10 An up-to-date specification version can be found at. https://github.com/samtools/hts-specs 11 TSV (Tab Separated Values) text format for storing and viewing tabular da- ta. 12 Behjati & Tarpey, What is next generation sequencing?, Arch Dis Child Educ Pract Ed 2013;98:236-238 doi:10.1136/archdischild-2013-304340 http://ep.bmj.com/content/98/6/236.full 13 STR-markers (short tandem repeats). 14 Public projects data from the Family Tree DNA website: http://www.familytreedna.com/projects.aspx. Hereinafter haplotypes from the specified source are marked as follows - FTDNA kit and haplotype number. 15 MURKA by Valery Zaporozhchenko (Research Center of Medical Genetics of the Russian Academy of Medical Sciences, Moscow, Russia). http://sourceforge.net/projects/phylomurka/ 16 http://www.yfull.com/ formation on phylogenetic structure of a respec- tive subclade.
Results
Clusterization of Q-M378 subclade based on SNP and STR-markers analysis
Given that SNPs characterize distribution of haplotypes into clusters in a more specific way, primary clusterization was made taking into ac- count the known data on SNPs, defining sub- levels of Q-M378 subclade.
There are three downstream subclades cur- rently known 17 Q-L245, Q-L301, Q-L327. SNPs with an L prefix, defining the above subclades, were identified at the Family Tree DNA lab led by Dr. Thomas Krahn.
Geography of Q-L245 distribution essentially repeats geography of M378 distribution (except for Central and Southern Asia).
Q-L301 subclade is localized exclusively in Iran 18 . Simultaneous presence of two subclades Q-L301 and Q-L245 in Iran and Iraq among au- tochthonous population is indicative of the long duration of residence of M378 mutation carries among the people living in this region 19
20 .
L327 is a private SNP, represented by a sin- gle haplotype of a Portuguese from Azores 21 .
Another private SNP 22 is P306, localized in one Indian. That being said, it was not found among the tested representatives of Q-M378 subclades (including Q-L301) 23 .
Until recently only two SNPs were acknowl- edged as downstream of L245 24 : L272.1, de- tected in Europe (Sicily) and L315 (discovered in
17 Y-DNA Haplogroup Q and its Subclades 2013 - http://www.isogg.org/tree/ISOGG_HapgrpQ.html 18 FTDNA kit 178026, M7540, M7949. 19 Nadia Al-Zahery et al, In search of the genetic footprints of Sumerians: a sur- vey of Y-chromosome and mtDNA variation in the Marsh Arabs of Iraq (2011). http://www.biomedcentral.com/1471-2148/11/288 This work has some data on Q haplotypes present in the Marsh Arabs (n=143) and Iraqis (n=154). Q-M378 has a frequency of 2.1% in the first case and 1.9% in the second one. 20 Grugni et al., Ancient Migratory Events in the Middle East: New Clues from the Y-Chromosome Variation of Modern Iranians (2012). DOI: 10.1371/journal.pone.0041252. Among those positive to SNP M378 the following ethnic groups come under notice Khorasan Persians - 3 out of 59 (5.1%), Es- fahan Persians - 1 out of 11 (9.1%), Lurs - 2 out of 50 (3.9%), Assyrians - 1 out of 39 (2.6%), Azerbaijani - 1 out of 63 (1.6%). 21 FTDNA kit 13254. 22 FTDNA kit N78873. 23 FTDNA kit 178026, M7540, 193005, 95307 respectively. 24 Both are private SNPs, i.e. found so far in a single carrier of such mutation. L315 FTDNA kit 51 and L272.1 (FTDNA kit 95307). L315 may not be stable as it was positive in HG02291 sample.
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
87
East European Ashkenazi). Below L245 SNP L619.2 is located as well, discovered in two rep- resentatives of Armenian Diaspora 25 . Further- more, the fact that this SNP emerged relatively recently is confirmed by existence of Armenian Diaspora representatives, who showed no sign of this polymorphism 26 .
Consequently, until very recently Q-L245 subclade could not be clusterized using SNPs. Thereby phylogenetic definitions and analysis of STR-markers were used for clusterization. A segment of DYF395S1 chromosome of low va- riability 27 was used for clusterization (the ap- proach was initially proposed by Q yDNA Project 28 administrator Rebekah A. Canada), which allowed formation of stable clusters with respective geographical and ethnic reference.
For example, the following clusters were hig- hlighted using this approach.
DYF395S1=14-17
It includes four haplotypes: two Dagestanis (identifiers according to the cited publication 29 - Avar Dag 511 and Kaitag Dag06 894), a Turk 30
and an Arab of Iraq 31 . The latter belongs to the legendary tribe of Quraysh (Adnan-Modar tribal self-definition).
This cluster is located closer to the tree root L245 than any other one and, apparently, is the nearest to the ancestral haplotype.
DYF395S1=15-17
It includes a whole group of haplotypes of people of various origin. One can pinpoint the following subclusters in the cluster:
- Central European (localization of most ancestral lineages Switzerland 32 , part of them is linked to a Mennonite community);
25 FTDNA kit E5340, 191379. 26 FTDNA kit 173902, 178717. 27 Vladislav Ryzhkov, Calculating time to the most recent common ancestor by separate panels of Y-STR markers, sorted by increasing mutation rate constants, The Russian Journal of Genetic Genealogy (Russian version): Vol. 3, No. 2, 2011, ISSN: 1920-2997 http://ru.rjgg.org 28 Q yDNA Project http://www.familytreedna.com/public/yDNA_Q/ 29 Balanovsky et al, Parallel Evolution of Genes and Languages in the Caucasus Region. Molecular Biology and Evolution, 13 May 2011. 30 FTDNA kit 303617. 31 FTDNA kit 197506. 32 The SCHACKE surname appeared in Germany at least as early as the 1600s and perhaps earlier. The JAGGI surname in Switzerland goes back much further. With this DNA Project we hope to learn more about our early ancestors and where our ancestors originated. Johann Christoffel SCHACKE, the paternal ancestor of most who carry the SHOCKEY surname, was born in Kirchheimbolanden, Pfalz, Germany in 1720 to Swiss parents. He arrived in Philadelphia PA in 1737. The Anglicized version of his name became John Christopher Shockey. He and his wife Barbara had nine children between 1739 - North-European (localization of most ancestral lineages Netherlands 33 );
- Italian (including haplotypes with partial SNP L272.1);
- Armenian;
- Southwest Asian.
It should be noted that according to DYF395S1=15-17 attribute, a number of haplo- types with no L245 mutation, are part of the cluster, in particular haplotypes of a level, which will be further described as Q-Y2250, as well as haplotypes of level Q-L327, and Q-P306. How- ever, in view of a thesis adopted by us on priori- ty of SNP application during clusterization, we will not do that. This also implies a conclusion that clusters DYF395S1=14-17 and/or 15-17 were formed already as a part of Q-M378 level. This hypothesis however can be made more specific only with the growth of a number of tested representatives of the cluster.
DYF395S1=15-18
DYF395S1=15-19
These two clusters are represented exclu- sively by people of Jewish origin.
Individual haplotypes, having RecLOH (the so-called Recombinational Loss of Heterozygosi- ty) in this part of Y-chromosome, were not con- sidered under this clusterization.
It is expected to identify SNPs, corresponding to each of the above-mentioned STR-based clus- ters, as part of further research.
and 1756, six sons and three daughters. After Barbara died John Christopher married Anna Marie COMPTON. John Christopher and Anna Marie had one son born in 1774 or 1775. This project hopes to help identify the descendants of the seven sons of John Christopher SHOCKEY as well as learn more about his Swiss ancestors and their related families from Germany and/or Switzerland. http://www.familytreedna.com/public/shockey-schacke/default.aspx 33 Huff/Hough Surname Project - http://www.familytreedna.com/public/HOUGH/default.aspx A Dutch named Derrick Pauluszen Hoff (1649-1730), who arrived in New Amsterdam (New York) no later than 1660, is considered to be the common ancestor of the family.
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
88
New phylogenetic structure of Q-M378 subclade, upstream and parallel subclades
As a result of processing and analysis of full Y-chromosome sequencing data some new sin- gle nucleotide polymorphisms were discovered, their placements defined on Y-chromosome (ac- cording to the reference sequence of human ge- nome hg19 34 ), as well as phylogenetic place- ments on the SNP tree.
The data on the new SNPs was summarized in Tables 3-5 along with Diagram 1, specifying SNP tagging according to Y notation 35 and Full Genomes Corporation notation 36 .
34 hg19 reference sequence or GRCh37. See also: Human Genome Overview. http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/ 35 Y SNP prefix according to YFull. 36 FGC SNP prefix according to Full Genomes Corporation.
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
89
Diagram 1. Phylogenetic tree of Q-M378 subclade, upstream and parallel subclades.
__________________________ Notes:
* SNPs included in ISOGG SNP tree (2013). * SNPs, included by ISOGG in the list of "SNPs under Investigation" or mentioned in public sources. * SNPs, explored by YFull team or/and Full Gemomes team. * SNPs, mentioned in public sources, are marked in green.
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
90
As can be seen from the above, below L275 SNP level the following levels, not described to this day, were discovered:
1) Q-Y1150 level, which is downstream of Q-L275 and parallel to Q-M378. SNPs of this level were discovered in only three natives of Hindustan (HG03914, HG03652, HG03864) 37.
2) Q-Y2250 level, downstream of Q-M378 and parallel to Q-L245. SNPs of this level (Table 3) were found in Ir1 and Kz1 samples. Seeing that Ir1 sample has a positive SNP L301 value, and Kz1 is negative to this SNP, it is evident that Q-L301 level is downstream of Q-Y2250. Private SNPs of Kz1 sample are listed in Appen- dix 3. Private SNPs of Ir1 sample are listed in Appendix 7.
3) Q-Y2220 level, downstream of Q-L245. This level combines haplotypes of Jewish and Armenian clusters Q-L245. All tested samples of this cluster representatives (AJ1, AJ2, Ar1) had positive SNPs of this level (see Table 4), excluding PGP130 sample (Moroccan origin).
37 G.R. Magoon, R.H. Banks, C. Rottensteiner, B.E. Schrack, V.O. Tilroe, T. Robb, A.J. Grierson, Generation of high-resolution a priori Y-chromosome phylogenies using next-generation sequencing data, 2013, doi:10.1101/00802 (in prepara- tion, preprint on bioRxiv.org). 4) There is also Q-Y2220 level parallel to Q- Y2200 (xQ-Y2200) that contains SNPs, defining Armenian segment of DYF395S1=15-17 cluster. Due to the fact that these SNPs were found in only one sample (Ar1) they have a status of pri- vate ones. Although one can assume the follow- ing with high probability:
- that part of these SNPs will be characte- rized by a rather wide range of haplotypes of DYF395S1=15-17 cluster;
- Q-L619.2 level will be downstream of Q- Y2220 (xQ-Y2200), since only a part of Arme- nians, who are positive to SNP L245, belong to it. Ar1 sample, tested by us, showed no sign of L619.2 mutation.
5) Q-Y2200 level, downstream of Q-Y2220. SNPs of this level define Jewish cluster Q-L245 (see Table 5). Private SNPs of samples AJ1 and AJ2 are listed in Appendices 5, 6. In addition, both tested samples had no L315 mutation.
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
91
Table 3. Q-Y2250 level. New SNPs, downstream of positive SNP M378.
Position (hg19) Ancestral value Derived value SNP name (Y) SNP name (FGC) or synonym 7115834 C T Y2244 FGC4626 6894323 C T Y2245 PR683 3544336 C G Y2246 FGC4613 2765038 T G Y2247 FGC4607 4070598 G A Y2248 FGC4618 4242831 A G Y2249 FGC4619 4852955 G A Y2250 FGC4620 6537988 A G Y2251 FGC4624 6724553 C T Y2252 8671530 A G Y2255 FGC4631 10077457 T C Y2256 FGC4635 15766997 A C Y2263 FGC4646 18169503 A C Y2264 FGC4656 18803364 C T Y2265 FGC4657 18990293 A G Y2266 FGC4659 22525954 AT A Y2268 23956540 A T Y2269 FGC4675 24452225 G C Y2270 FGC4676 15684681 A T CTS4507 13643442 T C FGC4638 ___________________________ Note: Y2268 deletion.
Table 4. Q-Y2220 level. New SNPs, downstream of positive SNP L245.
Position (hg19) Ancestral value Derived value SNP name (Y) SNP name (FGC) 9408770 G T Y2220 FGC1904 18051798 A C Y2209 FGC1917 22017904 G T Y2202 FGC1925 4914530 A G Y2229
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
92
Table 5. Q-Y2200 level. New SNPs, downstream of positive SNP L245.
Position (hg19) Ancestral value Value positive to SNP SNP name (Y) SNP name (FGC) 23646920 C T Y2196 FGC1934 22953894 A G Y2197 FGC1933 22825080 A G Y2198 FGC1932 22588598 C T Y2200 FGC1929 22471554 A T Y2201 FGC1928 21277083 G A Y2203 FGC1923 19425984 G A Y2206 19053060 C T Y2207 FGC1919 18207170 A G Y2208 FGC1918 18046486 T C Y2210 FGC1916 18043999 G A Y2211 FGC1915 16994660 T A Y2212 FGC1914 15834557 G A Y2213 FGC1912 14385853 T G Y2215 FGC1911 14353022 A C Y2216 FGC1910 14184253 C A Y2218 FGC1909 9892635 C T Y2219 FGC1906 9401947 C A Y2221 FGC1903 8662585 C A Y2224 FGC1899 6949449 C T Y2225 FGC1897 4606181 C T Y2231 FGC1890 3995524 G A Y2232 FGC1888 3148720 A G Y2233 FGC1886
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
93
Placement of SNPs, listed by ISOGG as SNPs under Investigation, was specified within the scope of this work: F108, F803, F815, F1082, F1126, F1169, F1213, F1337, F1349, F1528, F1537, F1594, F1734, F1780, F1836, F1839, F1858, F1875, F1974, F2023, F2145, F2230, F2313, F2343, F2440, F2628, F2657, F2777, F2851, F2877, F2894, F2934, F3084, F3121, F3193, F3207, F3389, F3621, F3680. On May 8, 2013 all of the above SNPs were classified by ISOGG as pertaining to level L245 or below. The analysis showed necessity to modify the pro- posed scheme. All SNPs, apart from F1213, F1349, F1594, F1734, F1780, F1836, F1839, F2230, F2877, pertain to level Q-L275, as they are positive for samples HG03914, HG03652, HG03864, AJ1, AJ2, Ar1, Ir1. The remaining SNPs, in their turn, are positive to all samples in the research that are positive to M378 and L245. Consequently, the said SNPs are at the same level with Q-L275 and Q-M378 respectively 38.
Besides, a considerable amount of new SNPs was discovered at the same level with L275, M378 and L245.
For example, the following SNPs pertain to level Q-L275 - Y1014-Y1022, Y1024-Y1057, Y1059-Y1069, Y1071-Y1137, Y1139, Y1142, Y1153, Y1160, Y1164, Y1166, Y1167, Y1169, Y1195, Y1220, Y1240, Y1978-Y1983, Y1985- Y1989, Y1991-Y1993, Y1995, Y1996-Y1997, Y2003, Y2005-Y2007, Y2009, Y2239, Y2243;
The said SNPs do not at the moment have any phylogenetic meaning, but it can be as-
38 It should be noted that FTDNA research team led by Dr. Thomas Krahn, with the participation of Q yDNA Project administrator Rebekah A. Canada, came to a similar conclusion earlier. Respective data can be found on the SNP tree draft version page of the Family Tree DNA website: http://ytree.ftdna.com/index.php?name=Draft&parent=31182976 There was no published justification of such conclusions, but, presumably, samples, tested un- der National Geographic Geno 2.0 project, were used for the analysis. signed to them later after a full sequencing of samples, pertaining to these levels and without SNP mutation, defining downstream levels.
Summary
The research proved high efficiency of full Y- chromosome sequencing to define phylogenetic structure, allowed for forming a consistent phy- logenetic structure of Q-M378 subclade, con- firmed by analysis of SNP and STR-markers.
As part of the research, new phylogenetic le- vels of Q-Y2250 (downstream of Q-M378 and in- cluding Q-L301), Q-Y2220 (downstream of Q- L245), Q-Y2200 (downstream of Q-Y2220) were defined. SNPs, which, in the future, may possi- bly mark certain European and Asian subclusters Q-Y2220 (including the Armenian subcluster), as well as separate branches of the Jewish cluster Q-Y2200, were also defined.
The research confirmed connection of Q- M378 subclade distribution with migration of In- do-European language carriers from Central Asia via Afghanistan and Iran to the West. That being said, the amount of materials at the researchers' disposal at the moment is not enough to form an entire picture of the mentioned migration processes. The specified task can be resolved in the near future, while statistically significant da- ta is being accumulated.
Acknowledgements
The authors of the article wish to thank the following people, who rendered their assistance in its preparation and conducting the research:
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
94
Appendix 1. Table 6. SNPs at the same level with M378.
Position (hg19) Ancestral value Derived value SNP name (Y) or synonym SNP name (FGC) 2806676 A G Y2012 FGC1770 3111159 G C Y2013 FGC1758 3815203 G C Y2016 FGC1774 3929337 C A Y2017 FGC1988 4234101 A G Y2018 FGC1775 4332151 G A Y2019 FGC1776 4634427 C A Y2020 FGC1777 4775787 T C Y2021 FGC1779 4778576 A G Y2022 FGC1780 4783438 T C Y2023 4961249 C A Y2024 FGC1781 5011266 A G Y2025 5266522 A G Y2026 FGC1782 5496739 A C Y2027 FGC1783 5687522 T A Y2028 FGC1784 5751055 T G Y2029 FGC1785 5872168 C T Y2226 5963558 G A Y2030 6085717 C A Y2031 FGC1788 6430659 T G Y2032 FGC1789 6617825 T C Y2033 FGC1790 6618215 T C Y2034 FGC1791 6746675 T C Y2035 FGC1792 6774328 T C Y2036 FGC1793 6986250 T C Y2037 FGC1794 7045044 C T Y2038 FGC1795 7071796 C G Y2039 FGC1796 7094691 A G Y2040 FGC1797 7159039 C G Y2041 FGC1798 7160439 G A Y2042 FGC1799 7339849 G T Y2043 FGC1801 7431253 C T Y2044 FGC1803 7437821 C G Y2045 FGC1804 7550568 G C Y2046 FGC1805 7652630 G A Y2047 7778164 G A Y2048 FGC1807 7856334 A G Y2049 FGC1808 7952263 C T Y2050 FGC1809 8067818 C G Y2051 FGC1810 8681004 T C Y2052 FGC1812
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
95
8682184 C T Y2053 FGC1813 8821295 A G Y2054 FGC1814 9074666 C T Y2055 FGC1815 9170505 G T Y2056 FGC1817 13127815 A G Y2057 FGC1818 13928638 G C Y2058 FGC1820 14017272 A G Y2059 FGC1825 14193680 G A Y2060 FGC1827 14293849 T A Y2061 FGC1830 14435779 A G Y2062 FGC1833 14540558 C T Y2063 FGC1834 14674385 C T Y2064 FGC1835 14733633 C A Y2065 FGC1836 15498011 C A Y2066 15521110 T C Y2067 FGC1838 15699493 C T Y2068 FGC1841 16217389 A AT Y2069 16654310 C G Y2070 FGC1842 16678163 C T Y2071 FGC1843 17230548 G A Y2072 FGC1844 17447489 C T Y2073 FGC1845 17959860 A G Y2074 FGC1850 18243302 C T Y2075 FGC1852 18714407 C A Y2076 FGC1854 18768735 G T Y2077 18768736 C A Y2078 18769454 A G Y2079 FGC1767 18803642 T G Y2080 FGC1855 18856911 G C Y2081 FGC1856 19373808 A T Y2082 FGC1858 21365952 G A Y2084 FGC1861 21479863 G A Y2085 FGC1862 21647670 G C Y2086 FGC1863 21832029 C A Y2087 FGC1864 22022365 A G Y2088 FGC1865 22101157 C T Y2089 FGC1866 22440644 G A Y2361 22624047 G A Y2090 FGC1768 22931328 T A Y2091 FGC1869 23053626 A G Y2092 FGC1872 23078557 G T Y2093 FGC1873 23166596 T C Y2094 FGC1874 23279919 G T Y2095 FGC1875 23566714 C T Y2097 FGC1877
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
96
23615574 AT A Y2098 28516009 A T Y2113 28593688 T C Y2114 28687807 A G Y2115
________________________
*Note: Y2098 deletion, Y2069 insertion.
Appendix 2. Table 7. SNPs at the same level with L245.
Position (hg19) Ancestral value Derived value SNP name (Y) SNP name (FGC) 2794289 C G Y2116 FGC1987 3127708 T C Y2117 FGC1771 3709585 A C Y2118 FGC1773 4502969 T C Y2119 FGC1759 4671322 C A Y2120 FGC1778 7219594 T C Y2121 FGC1800 7408851 C A Y2122 FGC1802 7590793 C T Y2123 FGC1806 8614513 C G Y2124 FGC1811 9144039 A T Y2223 FGC1901 9382621 G T Y2222 FGC1902 9798919 G A Y2125 FGC1816 13956388 G A Y2126 FGC1821 13982835 C T Y2127 FGC1823 14012662 G A Y2128 FGC1824 14045736 T C Y2129 FGC1826 14202870 A G Y2130 FGC1828 14285880 C G Y2131 FGC1829 14296099 C A Y2217 FGC1831 14402304 G A Y2132 FGC1832 15569048 C T Y2133 FGC1839 15614105 C G Y2134 FGC1840 16519324 A G Y2135 16757414 G GA Y2237 17686482 T C Y2136 FGC1846 17686883 A G Y2137 FGC1847 17763793 T A Y2138 FGC1848 17860015 G T Y2139 FGC1849 18134822 T C Y2140 FGC1851 18575106 G A Y2141 FGC1853 19300050 C T Y2142 FGC1857 21118566 T C Y2143 FGC1859 22015887 C A Y2144 FGC1989
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
97
22934317 ATC A Y2235 23010582 C T Y2145 FGC1870 23042385 C A Y2146 FGC1871 23648959 T G Y2147 FGC1878 23733052 A G Y2148 FGC1879 28520821 A G Y2149 28646637 C G Y2195 FGC1883 22767464 G A Y2199 FGC1868 21235857 A G Y2204 FGC1860
________________________
*Note: Y2235 deletion, Y2237 insertion.
Appendix 3. Table 8. Private SNPs for Kz1 sample.
Position (hg19) Ancestral value Derived value SNP name (Y) SNP name (FGC) 2980949 T C YFS026208 3027441 C A YFS026210 FGC4858 3751684 G A YFS026242 FGC4859 4164029 A G YFS026250 FGC4860 4515848 G A YFS026257 FGC4862 4714529 G T YFS026264 FGC4864 5394870 T C YFS026279 FGC4865 5398133 A T YFS026280 FGC4866 6088200 T C YFS026301 FGC4867 6675390 A G YFS026321 FGC4868 7058898 G A YFS026329 FGC4869 7208802 C T YFS026339 FGC4870 7278041 G A YFS026340 FGC4871 7704050 C T YFS026351 FGC4856 7929100 A C YFS026356 FGC4872 8268654 G A YFS026361 FGC4873 8684090 G A YFS026366 FGC4874 8714870 C T YFS026367 FGC4875 9154952 G A YFS026372 FGC4876 9990725 C G FGC4878 13230336 G A FGC4879 13313894 G C FGC4880 13637299 G A FGC4881 14599760 G A YFS026426 FGC4882 15353330 C T YFS026439 FGC4883 15540398 G A YFS026445 FGC4884 15617600 G A YFS026447 FGC4885 15656595 A C YFS026448
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
98
15881099 G A YFS026457 FGC4886 17344441 A G YFS026496 FGC4887 17455705 C G YFS026499 FGC4888 17619239 A C YFS026502 FGC4889 18132430 T A YFS026506 FGC4890 18205189 C A YFS026508 FGC4891 18235952 C A YFS026509 FGC4892 18427622 C T YFS026514 FGC4893 18699065 G A YFS026522 FGC4894 19119009 G A YFS026534 FGC4895 21794826 T C YFS026585 FGC4896 21824228 C T YFS026586 FGC4897 22216997 C A YFS026594 FGC4898 22263424 G T FGC4899 22464918 G A YFS029304 22470401 G T YFS029305 FGC4901 22476862 T A FGC4902 22779292 G A YFS026598 FGC4904 22845858 T A YFS026600 FGC4905 22980932 G A YFS026603 FGC4906 23097922 G T YFS026606 FGC4907 23188736 C T YFS026608 FGC4908 23574588 G T YFS026618 FGC4909 28577678 T G FGC4857 28556325 T G YFS026709
Appendix 4. Table 9. Private SNPs for Ar1 sample.
Position (hg19) Ancestral value Derived value SNP name (Y) SNP name (FGC) 2837084 G A YFS030295 4687602 C T YFS030307 3264534 G T YFS030298 3692600 G A YFS030300 6849037 A G YFS030309 7389018 T C YFS030314 7809088 C T YFS030318 FGC2000 8227956 C T YFS030321 FGC2001 8310172 G A YFS030322 FGC2002 8891034 A G YFS030324 FGC2003 9455617 G C YFS030326 FGC2004 9507128 G A YFS030327 FGC2005 13207417 C T FGC2006 13862984 G A YFS030335 FGC2007
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
99
14037704 A G YFS030339 FGC2008 14266100 G A YFS030343 14271743 G T YFS030344 FGC2009 14645998 A T YFS030350 15487465 T C YFS030354 FGC2010 15532493 G C YFS030355 FGC2011 15562737 G A YFS030356 FGC2012 15649426 C G YFS030357 15949197 C T YFS030358 FGC2013 16033272 G A YFS030359 FGC2014 16914913 A T YFS030368 17143642 G A YFS030370 FGC2015 17264341 C T YFS030371 FGC2016 17350212 G T YFS030372 FGC2017 17468836 G A YFS030374 FGC2018 17522056 C A YFS030375 FGC2019 17547056 C T YFS030376 FGC1986 17969724 T C YFS030377 FGC2020 18005360 G A YFS030378 FGC2021 18082500 T C YFS030379 FGC2022 18143358 C T YFS030380 18269281 T C YFS030381 FGC2023 19295864 G A YFS030386 FGC2024 19305808 C G YFS030387 FGC2025 21920836 G T YFS030396 FGC2026 22195671 T G YFS030398 FGC2027 22546195 T C YFS030431 FGC2029 23036871 A C YFS030432 FGC2030 23193319 C G YFS030433 FGC2031 23633830 T C YFS030434 FGC2032 23749442 C G YFS030435 FGC2033 23952561 G A YFS030438 FGC2034 28546577 A G YFS030460 FGC2035 28697215 C T YFS030463 FGC2036 28728861 A G YFS030465 FGC2037 28773229 G A YFS030466 FGC2038
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
100
Appendix 5. Table 10. Private SNPs for AJ1 sample.
Position (hg19) Ancestral value Derived value SNP name (Y) SNP name (FGC) 3014878 G C YFS028077 3279492 T C YFS028084 4705139 G A YFS028121 4734829 G T YFS028122 5007712 T C YFS028135 6028097 T C YFS028158 FGC4835 6671453 T A YFS028174 6985833 G C YFS028180 FGC4836 7116693 C G YFS028187 FGC4837 13225084 C A FGC4839 13227006 C T FGC4840 14174284 C T YFS028277 FGC4841 14683323 G A YFS028303 15749472 C G YFS028328 FGC4842 15911171 T A YFS028333 FGC4843 17216758 C G YFS028365 FGC4844 17842405 G A YFS028379 FGC4845 18697269 A G YFS028399 FGC4846 22541678 G A YFS028484 22545510 G T YFS028485 FGC4850 22809218 A T YFS028490 FGC4851 22816094 C T YFS028491 FGC4852 22989959 T C YFS028498 FGC4853 23338485 T C YFS028509 FGC4854
Appendix 6. Table 11. Private SNPs for AJ2 sample.
Position (hg19) Ancestral value Derived value SNP name (Y) SNP name (FGC) 3085515 C A YFS030088 FGC1885 4157714 C T YFS030093 FGC1889 7357489 C T YFS030117 FGC1898 8757232 C A YFS030130 FGC1900 9761433 C T YFS030140 FGC1924 16933881 C T YFS030164 FGC1913 19228285 T C YFS030189 FGC1920 21322098 A G YFS030210 FGC1924 22128896 C T YFS030218 FGC1926 22612418 A T YFS030247 FGC1930 22720359 C T YFS030248 FGC1931
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
101
Appendix 7. Table 12. Private SNPs for Ir1 sample.
Position (hg19) Ancestral value Derived value SNP name (Y) 2808294 G A YFS030486 2848925 C T YFS030487 3241019 G A YFS030493 3331565 C T YFS030495 3617298 G A YFS030498 3905106 T C YFS030501 3983695 G A YFS030503 4048861 C G YFS030505 4976524 T C YFS030521 4976526 T C YFS030522 5021496 G C YFS030523 5219277 T A YFS030526 5844571 C T YFS030529 6531744 G A YFS030531 7398730 T C YFS030543 7685828 G T YFS030547 7997281 G C YFS030548 8350958 G A YFS030550 8482074 C G YFS030551 8874735 C A YFS030553 9459692 A G YFS030555 9832592 A G YFS030556 14022660 C A YFS030564 14273656 A G YFS030573 14401614 C T YFS030575 14532575 G T YFS030582 14916116 G A YFS030585 14996654 G A YFS030588 15012864 C A YFS030589 15240341 G C YFS030591 15799031 G C YFS030596 15933501 T A YFS030599 16253494 C T YFS030602 16280147 C T YFS030603 16304710 T C YFS030604 16875622 C T YFS030608 17529042 G A YFS030616 18106050 C T YFS030618 18903761 A C YFS030626 19157289 G A YFS030633 19198307 A T YFS030634
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
102
19526472 A C YFS030637 21359025 C G YFS030656 21567329 G A YFS030657 22564450 C T YFS030684 22621906 G T YFS030685 22687343 A T YFS030686 22910874 G A YFS030688 23018638 T C YFS030689 23054174 T G YFS030690 23198785 A T YFS030691 23435852 A C YFS030694 24484883 T C YFS030706 28759876 C T YFS030732 17188634 T C YFS030609 19001468 C T YFS030630 20534862 T C YFS030645 21599239 A G YFS030658 21836635 A T YFS030661
Cladistic Relationships Among The Pleurotus Ostreatus Complex, The Pleurotus Pulmonarius Complex, and Pleurotus Eryngii Based On The Mitochondrial Small Subunit Ribosomal DNA Sequence Analysis