Академический Документы
Профессиональный Документы
Культура Документы
A. P. Jason de Koning, Wanjun Gu, Todd A. Castoe, Mark A. Batzer, David D. Pollock
Diego Halab Robles Programa de Doctorado en Ciencias Mdicas Unversidad Austral de Chile
Backround
RepeatMasker (RM) is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences Limitations; Not masking repeats per se Poor sensitivity Time consuming
Probability Clouds
Previous studies (Gu et al., 2008) on two human chromosomes indicated that a reasonably large fraction of the human genome is likely to be of repetitive origin, but not annotated by RM
Aim: to analyze the reliability of P-clouds and RM methods for identifying different sizes of fragments from two large and well-known families of human SINEs: Alu and MIR
Methods
P-Clouds algorithm
Oligonucleotide counts Clusters p-clouds Regions with high density of p-clouds
False positive assessment (oligos lenght = 16; cuttout distance = 3) Assessment of P-clouds annotation overlaps (BED, UCSC Browser)
P-clouds and RM detection capability for fragments of known elements; Alu ( 286 bp / 30, 40 and 50 bp) and MIR ( 260 bp / 30, 50, 80, 100, 150, and 200 bp) P-cloud = C10 80%
Element-specific P-clouds (ESPs) for specific annotation of novel Alu and MIR elements Validation of ESP predictions (BLAST)
Results
Weaknesses
Ausencia de un gold standard An es dbil detectando secuencias ms divergentes
Conclusions
Combined P-clouds and RM analysis of the human genome indicate that it consists of at least 6669% repetitive sequence P-clouds still probably provide underestimates of the true genomic TE content
References
A. de Koning, W. Gu, T. Castoe, et al., 2011. Repetitive elements may comprise over two-thirds of the human genome. PLoS genetics, 7(12), p.e1002384. W. Gu, T. Castoe, D. Hedges, et al., 2008. Identification of repeat structure in large genomes using repeat probability clouds. Analytical biochemistry, 380(1), pp.77-83.