Академический Документы
Профессиональный Документы
Культура Документы
Fred S. Roberts
DIMACS
Rutgers University
Outline
Phylogenetic Tree
Reconstruction
Phylogeny (continued)
New methods of phylogenetic tree
reconstruction owe a significant amount to
modern methods of DM/TCS.
Trees, supertrees, consensus trees will all be
discussed at length in this meeting
I will only make a few brief remarks about
them.
Database Issues
Assembling the tree of life requires
collecting massive amounts of data
about the worlds scientific species.
Making it a collaborative project
requires making such data universally
available.
There are great challenges for Math and
CS, specifically DM and TCS.
Thanks to the Global Biodiversity Information Facility
(GBIF) for many of the following ideas.
Complexity of Data
In many ways,
data about the
worlds
species are
far more
complex than
genetic or
protein
sequence
data. (GBIF)
Nomenclature
Nomenclature (contd)
The same species is
often named more
than once.
On the average, each
species has two
additional names
(synonyms) besides
its own name. (GBIF)
Nomenclature (contd)
Thus, there is need to
assemble names in an
electronic catalogue,
with synonyms and
common misspellings.
This would be of
fundamental
importance in aiding
research on
biodiversity.
Nomenclature (contd)
Because of errors,
one major
challenge for TCS
is data cleaning.
Nomenclature (contd)
Another challenge is to search a database
to see if two entries are similar.
This is a standard problem in database
theory.
TCS algorithms involving k-nearest
neighbor and other methods are very
helpful here.
Name
Equal to:
Size in Bytes
Bit
1 bit
1/8
Nibble
4 bits
1/2 (rare)
Byte
8 bits
Kilobyte
1,024 bytes
1,024
Megabyte
1,024 kilobytes
1,048,576
Gigabyte
1,024 megabytes
1,073,741,824
Terrabyte
1,024 gigabytes
1,099,511,627,776
Petabyte
1,024 terrabytes
1,125,899,906,842,624
Exabyte
1.024 petabytes
1,152,921,504,606,846,976
Zettabyte
1,024 exabytes
1,180,591,620,717,411,303,424
Yottabyte
1,024 zettabytes
1,208,925,819,614,629,174,706,176
Interoperability
Goal: Devise
standards for
datasets so as to
allow researchers to
collaborate across
datasets develop
standards leading to
database
interoperability.
(GBIF)
Interoperability
Challenge: How do we develop ways to
more accurately represent observational
or experimental data so that others may
use them? (Jessie Kennedy)
Challenge: Deal with issues of
inconsistency and scalability.
Challenge: Formalize issues of policy with
regard to others databases.
Challenge: Interoperability over a diversity
of users and types of equipment.
Interoperability
One approach: Semantic Web the
idea used to express the growing
desire to make information access on
the Web more knowledge-based so
humans and intelligent software can
work together. (Susan Gauch)
Interoperability
Another
approach: Make
use of languages
such as XML
developed to aid
interoperability in
business and
military
collaborations.