Вы находитесь на странице: 1из 50

2.

BIOMEDICAL DATA
AND DATABASES
Biomedical Informatics
Assoc. Prof. Tomaž Vrtovec, Ph.D.

University of Ljubljana, Faculty of Electrical Engineering Electrical Engineering, level 2


Laboratory of Imaging Technologies International course
2. Biomedical data and databases 2 / 50

BIOMEDICAL DATA
What are biomedical data?

A biomedical datum is a single observation of a patient or


a biological process, and is defined by the following elements:
- the patient or the biological process in question
- the parameter being observed
- the value and units of the parameter in question
- the observation time

- John Doe
(ID: 0110975500213)
- body weight
- 74 kg
- 12.9.2012

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 3 / 50

BIOMEDICAL DATA
What are biomedical data?

The acquisition of biomedical data is influenced by:


- observation circumstances
- observation uncertainty
- observation interpretation

74,5 kg =
= 74 kg
4:22
?
74,5 kg =
= 75 kg
19:52

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 4 / 50

BIOMEDICAL DATA
DIKW model

Wisdom

Knowledge
(active information)

Information
(formed data)

Data

Source: J.
H.Rowley:
Cleveland:
TheInformation as resource.
wisdom hierarchy: The Futurist,
representations December
of the 1982, PageJournal
DIKW hierarchy. 34 of Information Science 33(2):163-180, 2007
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 5 / 50

BIOMEDICAL DATA
DIKW model

Connectivity Why?

Connection How?
Wisdom
of wholes
Who?
Forming What?
When? Knowledge
of wholes
Where?
Novelty
Connection (doing the
of parts
Information right things)

Experience
(doing things right)
Formation
Data
of parts

Understanding
Research Absorption Action Influence Judgement

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 6 / 50

DATA CLASSIFICATION
According to the structure

Data

Descriptive Numerical
(qualitative) (quantitative)

Nominal Ordinal Interval Ratio

A A
C C
B B

A B C A> C > B 2 5 8 0 2 5 8
Source: S.S. Stevens: On the theory of scales of measurement. Science 103(2684):677-680, 1946
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 7 / 50

DATA STRUCTURE
Nominal biomedical data

The data is descriptive in nature, therefore it cannot be ordered


according to size. Values such as mean, median, etc. cannot be
computed. A B C

Operations:
- equality / inequality
- grouping

Example of dichotomous data: Example of non-dichotomous data:

Citizenship:
Gender:
- Slovenian
- male
- Italian
- female
- Austrian
- …
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 8 / 50

DATA STRUCTURE
Ordinal biomedical data

The data is descriptive in nature, but can be ordered according to


size. The mean value still cannot be computed, but the median value
can be. A> C > B

Operations:
- everything that was enabled for nominal data
- greater / less

Example of dichotomous data: Example of non-dichotomous data:

Age: Opinion:
- younger - completely agree
- older - partially agree
- partially disagree
- completely disagree
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 9 / 50

DATA STRUCTURE A
Interval biomedical data
C
The data is quantitative in nature, can be ordered according to size B
and can be added or subtracted, but ratios cannot be computed. The
mean value can be computed. 2 5 8

Operations:
- everything that was enabled for ordinal data
- addition / subtraction

Example:

- temperature - date
Celsius (C)
scale

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 10 / 50

DATA STRUCTURE A
Ratio biomedical data
C
The data is quantitative in nature, ratios can be computed (therefore B
the reference value – the absolute zero – exists).
0 2 5 8
Operations:
- everything that was enabled for interval data
- multiplication / division

Example:

- temperature in - body
Kelvin (K) height
scale

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 11 / 50

DATA STRUCTURE
Summary

Data Statistical operations Empirical operations Mathematical operations

Nominal modus equality =


inequality ≠

Ordinal + median value + greater than = >


+ percentiles + less than ≠ <

Interval + mean value + addition = > +


+ standard deviation + subtraction ≠ < −
+ correlation

Ratio + coefficient of variation + multiplication = > + 


+ logarithm + division ≠ < − 
+ geometrical mean

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 12 / 50

DATA CLASSIFICATION
According to dimensionality

Data

0-D 1-D 2-D 3-D 4-D n-D

Low complexity High complexity

In (bio)medicine we often observe phenomena and processes along time, which is


therefore a common independent parameter.
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 13 / 50

DATA DIMENSIONALITY
0-D biomedical data

Non-dimensional data is represented by individual measurements of


the observed quantity.

Example:
- measurement: body weight, - measurement: blood pressure, e.g.
e.g. 74 kg 120/80 mmHg (systolic/diastolic)

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 14 / 50

DATA DIMENSIONALITY
1-D biomedical data

One-dimensional data is represented by values that depend on one


independent parameter.

Example:
- 1D signal: body weight - 0D video: heart beat (0D + time)
depending on body height

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 15 / 50

DATA DIMENSIONALITY
2-D biomedical data

Two-dimensional data is represented by values that depend on two


independent parameters

Example:
- 2D images: radiographic (X-ray) - 1D video: levels of sound pressure
image of the chest against the resonance frequency of the
Helmholtz resonator (1D + time)

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 16 / 50

DATA DIMENSIONALITY
3-D biomedical data

Three-dimensional data is represented by values that depend on


three independent parameters.

Example:
- 3D images: magnetic resonance (MR) - 2D video: modelling of the
images of the head at different locations electric waves in the heart
muscle (2D + time)

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 17 / 50

DATA DIMENSIONALITY …
4-D biomedical data

Four-dimensional data is represented by values that depend on four


independent parameters.

Example:
- 4D data: 3D computed tomography (CT) - 3D video: 4D ultrasound
image of the chest with superimposed lung (3D + time)
movements

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 18 / 50

DATA DIMENSIONALITY
n-D biomedical data

Multi-dimensional data is represented by valued that depend on n


independent parameters.

Example:
- Genetic code: The term “curse of dimensionalityˮ refers to
various phenomena that arise when analyzing
and organizing data in high-dimensional
spaces (e.g. noise, poor evaluation) that do
not occur in low-dimensional spaces.

An important aspect is therefore also the


development and application of methods
for dimensionality reduction of the
observed data.
Source: M. Krzywinski et al.: Circos: An information aesthetic for comparative genomics. Genome Research 19:1639-1645, 2009
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 19 / 50

DATA CLASSIFICATION
According to type

Data

Descriptions Measurements Signals Images Videos

Low dimensionality High dimensionality

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 20 / 50

DATA TYPE
Biomedical descriptions

Descriptions are represented by data that are commonly stored in a


textual form and are acquired directly (e.g. by a conversation
between the clinician and the patient – narrative data):
- identifications
- observations
- diagnosis
- therapies
- medicaments
- vaccinations
- …

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 21 / 50

DATA TYPE
Biomedical measurements

Measurements are represented by one numerical value and the


corresponding measurement unit, for example:
- temperature (C)
Measurement
- heart beat (beats/second) accuracy / precision
- blood pressure (mmHg)

Precision
- blood analysis (mmol/l, mg/l)
- urinalysis (mmol/24h, mg/24h)
- body weight (kg)
- …

Accuracy

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 22 / 50

DATA TYPE
Biomedical signals

Signals are measurement values, represented as a function of time


(signals that are functions of frequency or wavelength are commonly
named spectra). They are acquired by measuring the electrical,
mechanical, biochemical and other physical quantities that describe
the patient or the biological process:
- electrocardiogram (ECG)
- electroencephalogram (EEG)
- electromyogram (EMG)
ECG
- spirogram
- sound
- blood pressure
- …
EEG

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 23 / 50

DATA TYPE
Biomedical images

Images are two-dimensional (2D) or three-dimensional (3D)


measurements of a specific quantity, represented as a function of
spatial coordinates. They enable to observe the spatial properties
and structural characteristics of the object in question:
- microscopic images
- radiographic (X-ray) images
- ultrasound images
- computed tomography (CT) images
- magnetic resonance (MR) images
- …

Imaging informatics is the field of biomedical informatics


that is concerned with biomedical images.

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 24 / 50

DATA TYPE
Biomedical videos

Videos are three-dimensional (2D + time) or four-


dimensional (3D + time) measurements of a specific
quantity, represented as a function of spatial coordinates
along time:
- ultrasound video
- CT/MR video
- …
The observation dimension is commonly
time, but can be also frequency, wavelength
or any other varying quantity.

As video can be interpreted as a sequence


of 2D or 3D images, imaging informatics
is also concerned with this topic.
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 25 / 50

DATABASES
What are biomedical databases?

Databases are organized collections of data that model


relevant aspects of reality by supporting the processes that
require the data in question.

Biomedical databases are data “librariesˮ about life sciences,


- acquired
EMBASEon the basis of scientific experiments and computational
analyses.

In relation to databases, one of the most important operations is


information retrieval, which consists of two elements:
- indexing according to terms made up of metadata
- retrieval according to queries broken down into metadata

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 26 / 50

DATABASES
Metadata

Metadata are structured information about the data that


enable data search, identification, selection and access, and
aid in data organization, indexing, scheduling and
management.

They are practically “data about dataˮ, “data about data carriersˮ
or “data about data contentsˮ.

- geographic latitude and - lateral, frontal and transversal


longitude (not part of the planes (not part of the body,
Earth, but required to but required to define a
define a location) location)

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 27 / 50

DATABASES
Metadata (2)

Metadata can be:


- descriptive, which serve for identifying, searching and retrieving the
data (e.g. International Statistical Classification of Diseases and Related
Health Problems – ICD-101)
- structural, which serve to describe the internal (logical or physical)
data structure as well as relationships among data parts (e.g. structural
hierarchy for classification of diseases)
- administrative, which serve to describe the technical aspects of the
source, related to its usage, analysis and management (e.g. security of
patient data)
The purpose of metadata is to improve the interoperability
within and among information system – i.e. to improve the
capability of information systems and corresponding
processes to exchange data, information and knowledge.
1 Online access via https://en.wikipedia.org/wiki/ICD-10 or http://apps.who.int/classifications/icd10/.
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 28 / 50

DATABASES
Properties

Database

Database Data Data Data


content model indexing retrieval

Bibliographic Hierarchical Manual Exact-


match
Full-text Network Automated
Partial-
Annotated Relational match
Aggregated Associative

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 29 / 50

DATABASE CONTENT
Bibliographic content

Bibliographic content refers to databases with links to the literature


Database
(scientific articles, books, etc.).

1. Bibliographic databases
Links Database
in the form of citation toData
medical (and other) literature.
Data Data
content model indexing retrieval
- MEDLINE1 EMBASE - - CINAHL2
- EMBAS
http://www.pubmed.gov
biomedicine health care
biomedicine > 7000 publications > 3000 publications
> 5500
Bibliographicpublications Hierarchical Manual
> 20 million entries > 2.6 millionExact-
entries
> 22 million entries 1947 – today 1937 – todaymatch
Full-text
1950 – today Network Automated
Partial-
Annotated Relational match
Aggregated Associative

1 Medical Literature Analysis and Retrieval System (MEDLARS) Online


2 Cumulative Index to Nursing and Allied Health Literature
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 30 / 50

DATABASE CONTENT
Bibliographic content (2)

- Web of Science - SciVerse Scopus - Google Scholar


http://www.isiknowledge.com/WOS/ http://www.scopus.com http://scholar.google.com
> 12000 publications > 19000 publications >>> publications
> 40 million entries > 47 million entries >>> entries
1900 – today 1823 – today (relevance controversy)

These three bibliographic databases are not limited only to the biomedicine and
health care.

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 31 / 50

DATABASE CONTENT
Bibliographic content (3)

2. Online catalogues
Web pages that do not display actual contents but rather links to
other web pages.

- HealthFinder - HON Select


http://healthfinder.gov http://www.hon.ch/HONselect/

3. Specialized lists
These are not only links to literature and web pages but display a more
diverse contents.
- National Guidelines Clearinghouse (NGC)
http://guideline.gov

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 32 / 50

DATABASE CONTENT
Full-text content

Full-text content refers to databases that, besides links to the literature, also
contain actual access to text.
Publishing companies in the field of biomedical literature have the leading role
through their online interfaces, where we can via a subscription access to the
full-text (HTML or PDF) and other contents (e.g. multimedia).

- ScienceDirect - Wiley Online Library


Elsevier John Wiley & Sons
http://www.sciencedirect.com http://onlinelibrary.wiley.com

- SpringerLink - OvidSP
Springer Wolters Kluwer
http://www.springerlink.com http://ovidsp.ovid.com

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 33 / 50

DATABASE CONTENT
Annotated content

Annotated content refers to databases that do not contain


literature but rather other forms of data.

1. Signals 2. Images 3. Gene sequences


- PhysioBank - Visible Human Project - GenBank
http://www.physionet.org http://www.nlm.nih.gov/research/visible/ http://www.ncbi.nlm.nih.gov/genbank/

- SIESTA - Images.MD - Genomes On Line


http://www.thesiestagroup.com http://www.springerimages.com/ImagesMD/
Database (GOLD)
http://www.genomesonline.org

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 34 / 50

DATABASE CONTENT
Annotated content (2)

4. Evidence-based medicine (EBM) 5. Bibliographic citations


- Cochrane Library - Science Citation Index (SCI)
http://www.cochrane.org http://ip-science.thomsonreuters.com

- CiteSeerX
http://citeseerx.ist.psu.edu

- Clinical Evidence
http://www.clinicalevidence.com

6. Other data
- Essential Evidence Plus
http://www.essentialevidenceplus.com - ClinicalTrials.gov
http://clinicaltrials.gov

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 35 / 50

DATABASE CONTENT
Aggregated content

Aggregated content refers to the aggregation of content from the first three
categories: bibliographic, full-text and annotated content.

- MedlinePlus - MedWeaver
http://www.nlm.nih.gov/medlineplus/ http://www.unboundmedicine.com

- Merck Medicus - Generic Model Organism


http://www.merckmedicus.com
Database (GMOD)
http://gmod.org

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 36 / 50

DATA MODEL
Hierarchical model

Database
Data is organized in a tree-like structure, therefore
enabling the parent/child relationships in the form 1-N: X: Diseases of the
- each parent can have an arbitrary respiratory system
number of children [J00-J99]
Database Data Data Data
- eachcontent
child has exactly onemodel
parent indexing retrieval

Acute upper Influenza and Other diseases of


respiratory infections pneumonia ... the respiratory
[J00-J06] [J09-J18] system [J95-J99]
Bibliographic Hierarchical Manual Exact-
match
Full-text Acute nasopharyngitis
[common cold]
Network
Influenza due to certain
identified influenza
Automated Postprocedural
respiratory disorders, not
[J00] virus [J09] Partial-
elsewhere classified [J95]
Annotated Acute
Relational
Influenza due to other match
Respiratory failure, not
sinusitis identified influenza
elsewhere classified[J96]
Aggregated [J01]
Associative virus [J10]

… … …

Source: International Statistical Classification of Diseases and Related Health Problems (ICD-10)
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 37 / 50

DATA MODEL
Network model

Data is organized in a tree-like structure, therefore enabling the


parent/child relationships in the form M-N:
- each parent can have an arbitrary number of children Diagnostic
- each children can have an arbitrary examinations
number of parents

Medical Physical Laboratory Radiological Endoscopic


history examination examinations examinations examinations

Blood X-rays Colonoscopy

Computed
Urine Bronchoscopy
tomography (CT)

… … …

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 38 / 50

DATA MODEL
Relational model

Data is organized in a tabular structure and linked together by “keys” that


represent the relationships among the content elements.

Table: Diagnostic examinations Table: Completed diagnostic examinations


Examination Examination Service Date Examination Patient
code name code (YYYY-MM-DD) code code
1 Medical history 067563 2011-10-01 1 04376
2 Physical exam. 067564 2012-04-23 4 00249
Key: Examination code = 4
3 Laboratory exam. 067565 2012-07-11 5 08562
4 Radiological exam. 067566 2012-07-19 4 12765
5 Endoscopic exam.

Table: Results
Examination Service Date Patient
Key: Examination code = 4 name code (YYYY-MM-DD) code
Radiological exam. 067564 2012-04-23 00249
Radiological exam. 067566 2012-07-19 12765

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 39 / 50

DATA MODEL
Associative model

Data is organized as individual parts, and the links among the content
elements is defined in the form of associations.

Table: Entries Table: Links


Code Description Code Source Activity Destination
101 patient 00249 201 101 103 108
102 patient 08562 202 201 109 111
103 has performed 203 202 000 105
104 has not performed
105 successfully Results
106 not successfully 201 = 101 + 103 + 108 [Patient 00249] [has performed]
107 laboratory examination [radiological examination].
108 radiological examination 202 = 201 + 109 + 111 [Patient 00249 has performed
radiological examination] [by]
109 by
[computed tomography].
110 radiography
203 = 202 + 000 + 105 [Patient 00249 has performed
111 computed tomography radiological examination by
computed tomography] []
[successfully].

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 40 / 50

DATA INDEXING
Controlled terminologies

A controlled terminology enable indexing on the basis of a


Database
pre-selected authorized concepts, terms and keys (metadata):
- a concept is an idea or object that exists in the world
- a term is the actual string of
Database one or more words that
Data Data Data
represent
contenta concept (the canonical
model form is the indexing
preferred retrieval
term, others are synonyms)
- a key is one of the versions of the term
Important aspects are hierarchy,
Bibliographic synonymity and relationships
Hierarchical Manual of the terminology.
Exact-
match
Examples:
Full-text Network Automated
- Medical Subject -
Unified Medical Language Partial-
Annotated Relational match
Headings (MeSH) System (ULMS)
http://www.nlm.nih.gov/mesh/ http://www.nlm.nih.gov/research/umls/
Aggregated Associative

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 41 / 50

DATA INDEXING
Manual indexing

Manual data indexing is performed by humans, who assign indexing terms


and attributes to documents, often following a specific protocol and using a
controlled terminology.

Atrial
Concept
fibrillation af

afib

a fib
Atrial Auricular
Terms
fibrillation fibrillation

Atrial Fibrillation, Atrial Auricular Fibrillation, Auricular


Keys (strings)
fibrillation atrial fibrillations fibrillation auricular fibbrilations

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 42 / 50

DATA INDEXING
Manual indexing (2)

Manual indexing is becoming obsolete, mostly due to the following limitations:


- inconsistency
- time consumption
It is still used for indexing some bibliographic databases, but is being replaced by
semi-automated and automated indexing.

- Dublin Core Metadata - Resource Description


Initiative (DCMI) Framework (RDF)
http://dublincore.org http://www.w3.org/RDF/

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 43 / 50

DATA INDEXING
Automated indexing

Automated data indexing is performed by a computer algorithm:


- extracting all alphanumeric sequences from the term
- removing the stop words, defined by the negative dictionary
- stemming the term to ensure unique indexing
Several problems may arise:
- synonymity: equal meaning of different terms
(e.g. “hypertensionˮ in “high blood pressureˮ)
- multiplicity: different meanings of the same term
(e.g. “leadˮ as an electrocardiography output or a chemical element)
- content: sometimes data does not reflect the topic
- context: variable meaning in relation to other data
- morphology: variable form of the language in use
(e.g. different term stems)

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 44 / 50

DATA INDEXING
Automated indexing (2)

Term-weighting methods index the terms according to the frequency of


their appearance in the data:
WEIGHT (term, data) = TF (term, data) × IDF (term)
- term frequency (TF):
TF (term, data) = the frequency of the term in data
- inverse data frequency (IFP):
no. of data in the database
IDF (term) = log + 1
no. of data containing the term

Link-based methods index the terms according to the


frequency of links from other data:
- The PageRank (PR) algorithm (used by the Google
search engine) gives more weight to a web page based on
the number of other web pages that link to it.

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 45 / 50

DATA RETRIEVAL
Exact-match

In exact-match searching, the retrieval system gives the user all data that
Database
exactly match the criteria specified in the search query.

As data (e.g. documents) are often represented by sets of elements (e.g. words),
Database
set-based (Boolean) searching Data Datasimilarity betweenData
is commonly used. The the
content
data and model by the Booleanindexing
the search query is defined logical operations: retrieval
- conjunction (AND: )
- disjunction (OR: )
Bibliographic Hierarchical Manual Exact-
- negation (NOT: ) match
Full-text Network Automated
Partial-
Annotated Relational match
Aggregated
This kind Associative
of matching is usually applied to bibliographic content. For an
efficient data retrieval, insight into the performance of Boolean operators
as well as the structure of the database in question is required.
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 46 / 50

DATA RETRIEVAL
Partial-match

In partial-match searching, the retrieval system ranks the data according


to the closeness against the criteria specified in the search query.

- In the case of algebraic (vector-space) model, the data


and the search query are represented as vectors. In such
a vector space is the closeness between data and the
search query a scalar value.
- In the case of a probabilistic model, the closeness
between data and the search query is represented as
relevance, which is in turn computed by Bayes’ theorem
or some other probabilistic approach.

This kind of matching is usually applied to full-text content. For an efficient


data retrieval, specific knowledge as well as knowledge about the structure of
the database in question is not required.
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 47 / 50

DATA RETRIEVAL
Model comparison

Model Advantages Limitations


Set-based - easy to understand - without partial-match
searching - expressive search query - complicated search vocabulary
- retrieved data cannot be
ranked
- no direct link to
data meaning

Algebraic - partial-match - higher complexity for


model - term-weighting computing the closeness
- ranking of the retrieved data - no direct link to
data meaning

Probabilistic - partial-match - term weighing is binary


model - term-weighting (probabilities)
- ranking of the retrieved data - data can be divided into
by relevance relevant and non-relevant

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 48 / 50

DATA RETRIEVAL
Success evaluation

Success can be evaluated qualitatively by analyzing the patterns of


usage and response, and can be:
- system-oriented, to evaluate the
system for data retrieval
- user-oriented, to evaluate the user of the
system for data retrieval

Success can be evaluated quantitatively by computing relevance measures:


relevant data ∩ retrieved data
Recall = relevant data

relevant data ∩ retrieved data


Precision = retrieved data

non-relevant data ∩ retrieved data


Drop-out = non-relevant data
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 49 / 50

BIG DATA
The next milestone of innovation, competitiveness
and productivity

“Big data” are data that due to their size and complexity cannot be efficiently
acquired, stored and analyzed, and cannot be managed by using currently
established system for database management.

Source: J. Manyika et al.: Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute, 2011
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
2. Biomedical data and databases 50 / 50

CONCLUSION
Discussion, comments, questions…

- Present the definition of biomedical data


and biomedical database.
- Describe the DIKW model.
- Present data classification according to the structure,
dimensionality and type. Give also examples.
- What are metadata?
- List and describe databases according to their content.
- List and describe the data models.
- List and describe the data indexing approaches.
- List and describe the data retrieval approaches.
- Does it make sense to achieve a recall of 100% when evaluating the success of
data retrieval? Does it make sense to achieve a 0% drop-out? Explain your
answers.
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course

Вам также может понравиться