Академический Документы
Профессиональный Документы
Культура Документы
¢ | £
Many projects require importing and then manipulating files as text or strings. A powerful set of functions for manipulating strings is
available and is similar to those for working with other data objects.
Import the MT chromosome on the human genome from a FASTA format file.
HSGenome = ImportA
"ftp:ftp.ncbi.nlm.nih.govgenomesH_sapiensCHR_MThs_ref
_chrMT.fa.gz", "FASTA"E;
HSGenome
Count the number of characters in the file and then display the first 200.
StringCount@HSGenome, LetterCharacterD
816 571<
StringTake@HSGenome, 200D
8GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGAT
AGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATTCTATTATTTAT
CGCACCTACGTTCAATATTACAGGCGAACATACCTACTA<
2 Genome Pattern Matching.nb
Pattern Matching
¢ | £
Mathematica’s pattern matching features make it easy to search strings for any pattern of interest.
pattern = x_ ~~ x_ ~~ y_ ~~ y_ ~~ y_ ~~ x_ ~~ x_;
For example, this searches the full genome for all occurrences of the pattern.
StringCases@HSGenome, patternD
H CCAAACC CCAAACC CCAAACC GGTTTGG AACCCAA CCAAACC CCAAACC AAGGGAA TTAAATT AATTTAA AATTTAA
Here are the positions in the sequence at which the pattern occurs.
StringPosition@HSGenome, patternD
H 8298, 304< 8303, 309< 8304, 310< 8350, 356< 8554, 560< 8654, 660< 8915, 921< 81686, 1692< 81722, 1728< 81791, 1797< 82050, 2056
Color those subsequences in the range 2000 to 2500 that match the specified pattern.
:TACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACT
TTAAATTTGCCCACAGAACCCTCTAAATCCCCTTGTA
AATTTAACTGTTAGTCCAAAGAGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAA
AATTTAACACCCATAGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACACCCACTACC
TAAAAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCAATCTATCACCCTATAGAAGAA
CTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAAGCCTGCGTCAGATCAAAACACTGAA
CTGACAATTAACAGCCCAATATCTACAATCAACCAACAAGTCATTATTACCCTCACTGTC
AACCCAACACAGGCATGCTCATAA
GGAAAGGTTAAAAAAAGTAAAAGGAACTCGGCAAACCTTACCCCGCCTGTT>