Академический Документы
Профессиональный Документы
Культура Документы
Systems Interoperability
Co-organizers:
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
2
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Table of contents
Foreword 5
CARDEA: Service platform for monitoring patients and medicines based on SIP-OSGi and RFID technologies in
hospital environment 62
Saúl Navarro, Ramón Alcarria, Juan A. Botía, Silvia Platas, Tomás Robles 62
On processing processes in healthcare: combining processes and reasoning in personal health records 68
Leonardo Lezcano, Miguel-Angel Sicilia 68
Archetypes and ontologies to facilitate the breast cancer identification and treatment process 80
Ainhoa Serna(1), Jon Kepa Gerrikagoitia(1), Iker Huerga, Jose Antonio Zumalakarregi (2), Jose Ignacio Pijoan(3) 80
3
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Annex I. Committees 91
4
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Foreword
The first OpenHealth-Spain symposium aimed at gathering together researchers and professionals
interested in the area of the interoperability in Health Systems. The workshop took place at the
Rectorate building of the University of Alcalá (Spain), 29th and 30th of April, 2009. The Information
Engineering Unit, a research group of the Computer Science Department of the University
organized the event with the collaboration of the Hospital of the Fuenlabrada city (located in the
metropolitan area of Madrid) and of the Health Division of Atos Origin.
The workshop was initially conceived by participants of the CISEP project (“Historia Clínica
Inteligente para la seguridad del Paciente”/”Intelligent Clinical Records for Patient Safety”, code
FIT-350301-2007-18, funded by the Spanish Ministry of Industry), and later supported by members
of a Spanish informal network of researchers and practitioners working in related topics. Even
though the working language of the workshop was Spanish, peer reviewed contributions were
requested in English to achieve broader dissemination.
The workshop featured an invited talk by the Director
of ATOS Research & Innovation, Jose Maria Cavanillas,
titled “Personalized Health and the Future Internet”,
and a round table discussing the recently issued
SemanticHealth report1, which were explained by
Raimundo Lozano from the Hospital Clínic (Barcelona).
A session was also devoted to discuss on the EuroRec2
network promoting best practice in Electronic Health
Records (HER).
This proceedings book collects the peer reviewed papers presented at the Workshop, which
reflect the diversity and high quality work being carried out at a national level in the topics of the
Workshop. The second edition of the workshop will take place at the Hospital of the Fuenlabrada
city in 2010.
Miguel-Angel Sicilia
Workshop co-chair
1
http://ec.europa.eu/information_society/activities/health/docs/publications/2009/2009semantic-health-report.pdf
2
http://www.eurorec.org/
5
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
6
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
Electronic Health Record (EHR) is one of the most important research areas in the Telemedicine field. In
this paper, the great European challenge of EHR standardization and interoperability, the standard
CEN/ISO 13606, is described. Our investigation group has an opened work line in this field, in which we
are carrying out projects as design and development of a ‘middleware’ module EHR server according to
the standard ISO 13606, compatibility studies between different standards (CCR modeling, design of
harmonization mechanisms based on archetypes) and the development of a demographic server
according to the standard EN13606 integrated in different clinical trials.
1. Introduction
The ongoing monitoring of patients and their high mobility are the factors which impulse the need
of using Electronic Health Records (EHR) in current health systems. The information systems
should be able to transfer the information so that its meaning is preserved, regardless of the place
they are located. For this reason, interoperability is an indispensable requirement for EHR. In
order to carry out this requirement, EHR should be normalized, or at least, interchange messages
between EHR systems.
The main goal of the European norm EN13606 is normalizing the transference between EHR
information systems (or part of them) so that they could be interoperable, but without specifying
how to implement such systems. The Task Force EHRCom, part of the Working Group 1 of the
Technical Committee 251 -Health Informatics- of the European Committee of Normalization (CEN)
[1] is responsible for the elaboration of the norm EN13606. It follows the current paradigms on the
standards creation: separation of points of view, separation of responsibilities and separation
between information and knowledge. This standard, automatically adopted as a Spanish standard
UNE, is in process to become an ISO standard. The first four parts have already been accepted and
the fifth part is following a common process in both normalization organizations under the Vienna
Agreements.
This standard is orientated to communication. It defines how to perform the EHR interchange
between information systems, allowing them to store clinical data as they prefer. The norm can be
7
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
divided into five parts: 1.- Reference model, 2.- Archetype interchange specification, 3.- Reference
archetypes and term lists, 4.- Security, and 5.- Exchange models.
The norm ISO 13606 is based on the double model information/knowledge, and its design changes
radically with respect to previous norms. The dual model is based on two models: reference model
and archetypes model. The reference model represents the structures used to store information,
while the archetypes model is used to generate structures that store the domain knowledge.
These structures are called archetypes.
The Reference Model (RM) defines the necessary structures to organize the information. The most
general structure is the extract, which contains the chosen part of the EHR to be transferred to
another information system. The extracts are contained in messages, which are higher level
structures defined in the part 5 of the standard. The extracts include demographic information to
recognize patients and all the agents related, information about access policies, clinical
information and other types of auxiliary information as audits or signatures.
8
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
9
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
reduce costs, to make more efficient the information interchange and to ensure a standard for the
information interchange when the patient is derived or transferred to another health professional.
The CCR is used to derive a patient from a section to another, to possess a personal health record
and to store an attention record and to present it in future occasions. The CCR contains three main
components: the header, the body and the footer. The header includes fields as identifier,
language, version, date/time and patient identifier, from, to and purpose. The body includes
sections relative to clinical status and administrative aspects. And the footer includes other extra
fields as actors, references, comments and signatures.
Our group made an exhaustive study about how to represent the CCR and its fields on an EN
13606 extract, for possible transmission among different information systems [7]. The result was
that the CCR can be represented as a composition on an extract, because it is a document created
in a unique interaction with the information system. This composition consists of three main
sections linked with the three main parts of the CCR: header, body and footer. This is shown in
figure 2. Each section is modeled by a different archetype.
10
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
11
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
large investigations projects and clinical trials of different pathologies (arterial hypertension [10],
oral anticoagulant treatment [11] and asthma [12]). In these projects, the patient personal
information is stored in local databases. Moreover, the demographic information used can vary
depending on each project.
For these reasons, our investigation group designed and developed an extern and independent
demographic server according to the norm ISO13606. This server is accessible by different
information systems to manage their demographic data. Thus, a separation in the storage
between clinical data and patient personal data is achieved as it is established by the data
protection law.
From now on, our group intends that the new investigation projects of this line will be integrated
with the demographic server, separating conceptually and physically the storage and the
management of clinical and demographic information. This is a first step to demonstrate the need
and viability of using extern and independent demographic servers to sanitary authorities.
For the design and development of the demographic server, the work and experience obtained in
the EHR server have been exploited. Many technologies are the same in both servers: Java as
programming language, database interface ODBC (MySQL), SAX technology for the parser,
communication technology based on Web Services implemented with the Axis tool.
The demographic server is focused in the classes of the demographic packet and the support
packet of the EHR server. The different functions are accessible for the clients through a web
service. These functions are the well-known functions storeExtract y retrieveExtract, from the EHR
server, and other new specific functions for the management of demographic data from the
Telemedicine projects:
retrieveIdentifiedEntity: returns an extract with all the demographic information of a
patient linked by one of the patient identifiers stored in the system
getPatientName, getPatientFullName y getPatientAllData: are different functions that
return different demographic data depending on the needs of each client. The input
arguments are a patient identifier and the project identifier.
The necessity and the concept of the demographic archetype were born in the development of the
demographic server. A demographic archetype is a way to represent the knowledge that limits the
multiple possibilities of the ADL language over the classes of the demographic packet. So,
information could be sent to the server according to specific archetypes, and the demographic
information could be asked to the server with only a patient identifier and a specific archetype. In
this way, the server would only return the requested data in the indicated format. Each project, or
each application, would define its own archetypes that model the used demographic information.
The server could be used with different archetypes without modifications.
References
1. CEN (technical committee 251). http://www.centc251.org (available Apr-2009). A. Muñoz,
R. Somolinos, J. A. Fragua, C. H. Salvador. Servidor de historias clínicas electrónicas
conforme a la norma EN 13606. Informática y Salud, n. 51, march 2005, 47-52
12
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
13
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
14
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
15
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
The construction of a Virtual Federated Electronic Health Record (VFEHR) requires using standards, tools
and an adequate technological infrastructure. We have developed LinkEHR as a framework platform for
the standardization, integration and sharing of health information among distributed and
heterogeneous Health Information Systems. To perform this task, LinkEHR trusts in archetypes as a
mechanism for semi-authomatic normalization of legacy data. This framework has already been
evaluated in existing health institutions for the construction of standardized extracts of the EHR.
Keywords: Health Information Systems, Standardization, Electronic Health Record, Archetype
1. Introduction
For many years, the standardization of health information systems has been an added value with
no direct influence in the daily work of healthcare institutions. Many other problems, such as the
basic digitalization of the diverse clinical information were more urgent. But healthcare
institutions are not a closed environment anymore. Nowadays, information sharing is not the
exception but the rule. To obtain a unified and universal electronic health record (EHR) for each
person is one of the most important objectives of health informatics. This Virtual Federated EHR
(VFEHR) should include all the existing information related to a person from his birth to his death,
independently of the place where the patient has received attention. Resolving this problem
requires interconnecting all the information systems and achieving an agreement about the
format of the transmitted information. But not only the syntax is important, but also the meaning
of the information, which assures a correct interpretation by human readers or computer systems.
This is called semantic interoperability and it is mainly based on the use of ontologies or medical
terminologies and the formal definition of the domain concepts that will be used by the HIS. Both
problems are faced by the CEN EN13606 standard for EHR communication.
2. CEN EN13606
The CEN EN13606 norm [1] is a five-part standard developed by the European Committee for
Standardization (CEN) intended for the communication and semantic interoperability of EHR
extracts among heterogeneous HIS.
Reference Model
The first part of the standard develops an information reference model (RM), which describes a
generic model for representing any clinical annotation of the EHR. It specifies how health data
16
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
should be aggregated to create more complex data structures, and the context information that
must accompany every piece of data in order to meet ethical and legal requirements. The RM also
stores context information of the clinical events together with clinical information. For example, it
supports information about the subject of care, the place and date of the clinical event, and the
participants in the clinical act.
Archetype Model
The second part of the standard defines an Archetype Model (AM). Archetypes are formal
definitions of domain-level concepts in the form of structured and constrained combinations of
the classes contained in the reference model. Their principal purpose is to provide a powerful,
reusable and interoperable mechanism for managing the creation, description, validation and
query of EHRs. For each domain concept, a definition can be developed in terms of constraints on
structure, types, values, and behaviors of business objects. Basically, archetypes are a means for
providing semantics to data instances that conform to some reference model by assuring that data
obey a particular structure and satisfy a set of semantic constraints.
Examples of archetypes can include prescriptions, problem lists, differential diagnosis, pregnancy
reports or blood pressure observations. In fact, any desired archetype can be defined when it is
needed.
3. The LinkEHR approach
In most health organizations the coexistence of several heterogeneous information sources in
terms of platform, structure (data model) and semantics is a common scenario. They were created
and are maintained to fulfill the requirements of a particular set of users or department. As a
consequence they are suitable for the department or application they were created for but not for
other users or applications that may also need to make use of the information held by these
systems.
In order to take the maximum advantage of information it is necessary to transform source data to
meet the data format of the target applications. This problem is known in the literature as the
data which is the problem of taking data structure under a source schema and creating an instance
of a target schema that reflects the source data as accurately as possible [2,3]. The effort required
to create and manage such transformations is considerable. It involves writing and managing
complex data transformations programs and keeping up with the changing sources.
17
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
The first step deals with the importation of reference models. In LinkEHR-Ed a new reference
model expressed as W3C XML Schema can be imported at any time. Obviously, this step only
needs to be performed once for each reference model. Therefore, it is possible to define
archetypes based on different standards. Three different reference models have been tested
successfully, namely CEN EN13606, OpenEHR and CCR. To the authors’ knowledge, LinkEHR-Ed is
the only editor capable of handling CEN EN13606.
The second step is the actual archetype edition process. New archetypes can be edited either from
scratch or by specialization of an existing one. Our job was to define a formal modeling framework
as a prerequisite for implementing a tool providing enhanced support for the edition of
archetypes.
The third step is about mapping specification. Since the health data to be made public resides in
the underlying data sources, it is necessary to define some kind of mapping information that links
archetype entities to data elements in data repositories (e.g. tables and attributes in the case of
relational data sources or element and attributes in the case of XML data). A mapping specification
between an archetype and a source schema is done by specifying a set of value mappings which
define how to obtain a value for an atomic attribute of an archetype by using a set of atomic
elements from the data sources and applying, if necessary, some transformation functions. For
instance, it includes functions for the transformation of source time and date values into values
conforming to the international standard ISO 8601 for date and time representation.
Finally, the fourth step is about the generation of data transformation programs. The actual
transformation of source data into archetype instances is done by an XQuery program which is
automatically generated from the mapping specification. Its execution over a set of data sources
yields a XML instance that satisfies the constraints imposed by the archetype and at the same time
is compliant with the underlying reference model.
18
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Clinical
HIS information
EHR Viewer
index
Extract server
19
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
4. Examples of application
Several examples of the application of this methodology have been developed during the last year
in collaboration with healthcare institutions: the Hospital General Universitario de Valencia and
the Hospital de Fuenlabrada, in Madrid.
20
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
A. Raffio, D. Braga, S. Ceri, P. Papoti, M.A. Hernández. “Clip: a Visual Language for Explicit Schema
Mappings,” presented at the 24th Int. Conf. on Data Engineering, Cancún, México.
J. A. Maldonado, D. Moner, D. Boscá, C. Angulo, M. Robles, and J. T. Fernández-Breis, “Framework
for clinical data standarization based on archetypes”. Proceedings of 12th World Congress on
Health (Medical) Informatics (MedInfo’07), pp. 454-458.
Plan de calidad del Sistema Nacional de Salud. Ministerio de Sanidad.
http://www.msc.es/organizacion/sns/planCalidadSNS/tic02.htm
21
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
Semantic interoperability of clinical standards is a major challenge in the eHealth across Europe,
because this would allow healthcare professionals to manage the complete EHR of patient. Archetypes
are considered a cornerstone to deliver fully interoperable EHRs. Our work is focused on the
development of ontology-based methods and techniques for providing semantic interoperability
between diferent EHR standards at archetype level. Hence, solutions for the semantic representation,
transformation and management of clinical archetypes are described in this work.
1. Introduction
The lifelong clinical information of any person supported by electronic means configures his
Electronic Healthcare Record (EHR). Nowadays there are diferent advanced standards and
architectures [1] for representing and communicating EHRs, such as HL7 [2], OpenEHR [3] and
UNE-EN 13606 [4]. Some of these advanced EHR standards, such as OpenEHR and UNE-EN 13606
make use of the dual model architecture approach [5]. This architecture is based on two modelling
levels: information and knowledge. The information level is provided by the reference model and
the knowledge level by the archetype model. Archetypes define clinical concepts and are usually
built by domain experts. They are a tool for building clinical consensus in a consistent way. The
semantic interoperability of clinical standards is a major challenge in the eHealth across Europe,
because this would allow healthcare professionals to manage the complete EHR of patient. Clinical
archetypes are fundametal for the consecution of semantic interoperability and they are built for
particular EHR standards.
Our recent work has focused on the development of methods and techniques for providing
semantic interoperability between diferent EHR standards at archetype level. First, a methodology
for obtaining a semantic representation of archetypes will be presented, describing how syntactic
archetypes can automatically be transformed in semantic ones. Next, the semantic interoperability
of two dual-model based standards UNE-EN 13606 and OpenEHR will be addressed. Finally, the
development of a prototype for the semantic management of clinical archetypes will be described.
2. An Ontological Representation of Clinical Archetypes
Clinical archetypes are defined using the Archetype Definition Language (ADL). This is a generic
language that does not allow to perform any semantic activity over archetypes. Nevertheless,
these activities require the exploitation of information and knowledge: comparisons, classification,
integration of information and knowledge coming from diferent, heterogeneous systems. In this
way, it can be stated that such activities are knowledge intensive, they require for the semantic
management of knowledge and information, and for semantic interoperability between such
heterogeneous systems. The advances in the SemanticWeb [6] community make it a candidate
22
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
technology for supporting such knowledge-intensive tasks related to archetypes and EHR systems.
This section aims at providing a mechanism for representing archetypes in a Semantic Web-
manageable manner.
23
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
24
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
25
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
mechanisms make use of semantic similarity functions for this purpose. In general, two main
searches can be performed: for similar archetypes, and for archetypes holding some properties.
The global similarity looks for archetypes similar to a given one by doing semantic comparisons in
the context of the archetype ontology available for the particular standard. Archetypes are
instances of that ontology, so that instances comparison mechanisms are used. These mechanisms
would take into account the following categories: conceptual proximity, property similarity,
annotations similarity and linguistic proximity. On the other hand, user can search archetypes
holding some properties. These can be either definitional or annotations properties.
On the one hand, we might be looking for archetypes written in English, orarchetypes including an
element measured in a certain unit. On the other hand, we might be looking for archetypes
related to a particular disease, being such associations being established through a classifier
resource.
5. Conclusions
Providing an OWL representation for archetypes allows semantic activities such as comparison,
classification, selection or consistency checking to be carried out more eficiently. Here, an
overview of our work towards semantic interoperability between archetypes has been presented.
The first step was the design and implementation of a methodology for the transformation of ADL
archetypes into semantic archetypes expressed in OWL. This methodology has been applied to
two dual-model based EHR clinical standards: UNE-EN 13606 and OpenEHR.
After that, a similar technological solution has been applied for transforming OpenEHR archetypes
into UNE-EN 13606 archetypes and vice-versa. Finally, the ArchMS system for annotating
archetypes and to perform diferent types of semantic searches has been presented.
References
1. Blobel, B.: Advanced ehr architectures_promises or reality. Methods of Information in Medicine
45(1) (2006) 95_101
2. HL7: http://www.hl7.org
3. OpenEHR: http://www.openehr.org
4. UNE-EN13606: http://www.centc251.org
5. Beale, T.: Archetypes and the ehr. Stud Health Technol Inform 96 (2003) 238_244
6. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American May (2001) 29_37
7. Schulz, S., Hahn, U.: Part-whole representation and reasoning in formal biomedical ontologies.
Artificial Intelligence in Medicine 34(3) (2005) 179_200
8. Smith, B.: From concepts to clinical reality: An essay on the benchmarking of biomedical
terminologies. Journal of Biomedical Informatics 39(3) (2006) 288_298
9. OWL-REF: http://www.w3.org/tr/owl-ref/
10. Fernandez-Breis, J., Vivancos-Vicente, P., Menarguez-Tortosa, M., Moner, D., Maldonado, J.,
Valencia-Garcia, R., Miranda-Mena, T.: Using semantic technologies to promote interoperability
between electronic healthcare records' information models. In: Proc. 28th Annual International
26
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Conference of the IEEE Engineering in Medicine and Biology Society EMBS '06. (Aug. 30 2006_Sept.
3 2006) 2614_2617
11. Kurtev, I., Bezivin, J., Aksit, M.: Technological spaces: an initial appraisal. In: CoopIS, DOA'2002
Federated Conferences, Industrial track,. (2003)
12. Martinez-Costa, C., Menarguez-Tortosa, M., Fernandez-Breis, J., Maldonado, J.: A model-driven
approach for representing clinical archetypes for semantic web environments. J. of Biomedical
Informatics 42(1) (2009) 150_164
13. ADL-Parser: http://www.openehr.org/svn/ref_impl_java/trunk/adlparser/
14. EMF: http://www.eclipse.org/emf/
15. OMG: Ontology metamodel definition specification. http://www.omg.org/cgibin/
doc?ad/2006-05-01.pdf (2006)
16. Protege: http://protege.stanford.edu/
17. Sanchez-Cuadrado, J., Garcia-Molina, J., Menarguez-Tortosa, M.: Rubytl: A practical, extensible
transformation language. In: ECMDA-FA. (2006) 158_172
18. MOFScript: http://www.eclipse.org/gmt/mofscript/
19. Fernandez-Breis, J.T., Menarguez-Tortosa, M., Martinez-Costa, C., Fernandez- Breis, E.,
Herrero-Sempere, J., Moner, D., Sanchez, J., Valencia-Garcia, R., Robles, M.: A semantic web-based
system for managing clinical archetypes. Conf Proc IEEE Eng Med Biol Soc 2008 (2008) 1482_1485
27
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
Privacy of personal health information is the target of many efforts of Health Information Systems
administrators. But every person has the right to gain access and control his own information security
rules. In this work we propose a framework for the definition of access policies oriented to its use by the
legal owners of data: the patients. At the same time, the framework guarantees some degree of decision
and control to other levels of responsibility: the organization custodian of the information and the
health professionals responsible of generating and incorporating the clinical data of a patient. Finally,
this framework is based in the CEN EN13606 standard to assure the interoperability of the defined
access policies.
Keywords: Health Information Systems, Standardization, Electronic Health Record, Archetype, Security,
Access Policy
1. Introduction
Security and privacy of health information has become one of the most relevant aspects of Health
Information Systems (HIS). Sharing information among different health institutions is a daily reality
and must be protected. At the same time, conscientiousness about the value of our own data and
the rights we have about it is increasing. Different laws and regulations have appeared about this
matter during the last decade, at national and international levels. We can highlight two main
international regulation frameworks: the directive 95/46/EC with regard to the processing and
communication of personal data in the European Union [1] and the Health Insurance Portability
and Accountability Act (HIPAA) from the United States of America [2]. They define which
information about health status, provision of health care, or payment for health care can be linked
to an individual and thus has an special protection status.
Other group of applicable laws is the referred to the protection of personal data and the rights
that every person has about his own information. Basically these laws provide a way in which
individuals can enforce the control of information about themselves: right of access, right to have
factually incorrect information corrected and right to delete the stored information.
We need to define a common technical framework to make true all these rights in a simple and,
preferably, standardized manner. This will enable a true patient empowerment of the definition of
access rules to their own clinical information.
28
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
29
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
30
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Organizational Normalized
Access Policy: HIS EHR data
Selection of
relevant
archetypes CA1 … CAn
By archetype node
2008/05/16
2009/01/25
31
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
5. Conclusions
We have presented a standard-based framework for the definition of interoperable and patient-
empowered access policies. The consecutive refinement of APA from the organization to the
patient owner of data assures that every level of responsibility can include its own opinions, rules
or limitations. The use of standard representations permits to refine access policies to the desired
level of complexity by using automatically generated administration forms. Moreover, the use of
archetypes and standardized clinical information allows communicating access policies among
different health institutions. Some questions should be solved in order to implement this
framework adequately. A common definition of roles, professionals and organizations identifiers
must exist. An ontology representing the different access policy concepts, such as “allow”,
“disallow”, “read”, “write”, etc. is also needed. Finally, we will also require some mechanisms
which can guarantee that the filtered health information after applying the APAs is still a clinically
valid EHR extract.
Acknowledgments
This work was supported in part by the Spanish Ministry of Education and Science under Grant
TSI2007-66575-C02-01 and by the Generalitat Valenciana under grant APOSTD/2007/055.
References
Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the
protection of individuals with regard to the processing of personal data and on the free movement
of such data. Official Journal of the European Communities, Nº L281, 23-11-1995.
Health Insurance Portability and Accountability Act of 1996, Public Law 104-191, (1996).
European Committee for Standardization: Health informatics - Electronic health record
communication. EN13606. (2006)
32
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
33
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
Explicit representation and knowledge management has been proposed as a key element to solve some
of the actual challenges in health informatics. In parallel, major advances have happened in the last
years in the field of knowledge management tools and methodologies, some of them have achieved
enough maturity to face implementations in real environments.
OntoDDB is a knowledge management based framework for the definition and modeling of data
repositories. All required information is gathered into an ontology in OWL format. The user interface is
built automatically on the fly in the form of web pages whereas data are stored in a generic repository.
This allows the immediate deployment and population of the database as well as on line availability of
any modification too.
Keywords: Ontology, Clinical repositories, knowledge management
1. Introduction
Health Sciences in general, and Medicine in particular, are sciences based upon information and
communication. A big part of the clinical practice and research processes consist in gathering,
summarizing and using information that, properly integrated with clinical knowledge, constitutes
the base for decision support and new knowledge generation. Nevertheless and in spite of great
advances in information and communication technologies (ICT) domain during last years, the
progress in Medical Informatics is slower than predicted and clinical information systems are
failing to provide true support for the clinicians needs [1,2].
Many causes have been identified to explain this situation and as expected in a domain so related
to knowledge, several authors point toward knowledge management as the way to solve the open
challenges in Medical Informatics [3]. Following the track: data informationknowledge it’s
possible to raise the abstraction level when looking at reality and opening new horizons to
progress in the application of ICT to the clinical information processing.
Great advances have happened last years that allow to envisage a solution to overcome the
impasse of Medical Informatics. One of them is the use of ontologies, in the field of knowledge
representation. Although there are not many examples of real implementations yet, there are
hardly any research area of Artificial Intelligence not using them [4–7] and their presence is more
and more common in the healthcare field [8].
Other is the progress in standardization. Once some more “technical” standards, as HL7 2.X or
DICOM, are well established in many installations. Standards in medical vocabularies like SNOMED
CT [9], and in information models [10] are beginning to establish the foundations for information
Corresponding author: Raimundo Lozano, Informática Médica, Hospital Clínic de Barcelona, 08036 Barcelona, Spain.
Tel.: +34 93 227 92 06; E-mail: rlozano@clinic.ub.es
34
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
interchange and reuse, conferring full of sense to knowledge management as the engine which
empower the Health Informatics scenario.
OntoDDB was designed with the aim of advancing in the path of explicit knowledge management
as a concept proof whith the following premises:
Clear-cut separation between information and knowledge.
Formal declaration of knowledge about the information to be stored as data, making it
explicit.
Capability to manage explicit knowledge.
Capability to build knowledge-driven systems.
Greater abstraction level which allows a greater expressiveness level when building
systems.
The chosen context is a real case of clinical information usage. The Hospital Clinic of Barcelona
(HCB) does a lot of Biomedical Research with a high impact-factor in scientific literature [11]. In
the last years a research project cannot be understood whithout the ICT support in some way.
Obviously, each one of the projects has specific requirements about the subject and objectives of
research, but there are always a set of common requirements related with information systems
support: all projects need to introduce and to store structured data for later analysis, in a
distributed access context and with high rate of knowledge turnover.
2. Objectives
OntoDDB is a system designed to build clinical data repositories for research, based on knowledge
engineering techniques.
In the last years, systems for gathering clinical data for research purposes were built using a multi-
tier architecture composed by a centralized database, an application server and a web server
providing the user interface. Although this architecture meets the basic requirements of this kind
of projects it presents some disadvantages. First of all, it’s costly. The expenditure investment in
the development of a medium-size application easily reaches several tens of thousands of euros.
Moreover, the maintenance is very high too because any modification implies changes of the
database, web pages and application. Secondly, the investment is made for a very short period of
time. Research projects have a typical extent of 2-3 years and during this time you have to include
the development phase of the system, which uses to be long. Thirdly, this classical approach
requires a very specialized panel of computer technicians which implies a big gap between the
biomedical researcher and the development team difficulting the communication between
functional and technical people. And last but not least, this kind of approach in the bosom of an
organization produces a big heterogeneity among the different applications devoted to research
projects, without any reusing of components.
The general objective of OntoDDB is to test the capabilities of a system driven by knowledge
management in order to meet the requirements for gathering data in a research context while
avoiding the disadvantages of more traditional approaches like those exposed above. The
underlying idea is that making explicit the usually implicit knowledge about the systems is possible
to raise the abstraction level in the design and management phases allowing an easy and cheaper
35
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
construction and maintenance of these systems without losing functionalities and lengthening
their life-cycle.
The system should achieve some specific objectives like:
To allow the data model specification by means of an ontology
To allow the user interface specification by means of an ontology
To get automatic data storage and user interface
To allow data model and user interface modifications in real time
To allow extracting data for analysis purposes
To allow distributed access to the system
3. Methods
The core of OntoDDB is the use of ontologies for knowledge representation. They are stored in a
database accessible by the rest of the system.
OntoDDB is composed by four modules and a metamodel:
A module for edition of the ontology
A module for storage of data and ontologies.
The Metamodel of the system
A module for building the user interface
A module for data extraction
36
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
To extract information from the database, a set of functions allow accessing to the subclasses,
properties and instances of a class, domain and range of properties, values of instance properties,
etc.
Fig. 1: OntoDDB-MM.
Metaclasses
o Class_root: Is a subclass of owl.Class. Is used to define the entry points in the
application.
Metaproperties
o webDataProperty: Is a subclass of DatatypeProperty used to collect needed
information to manage data type properties. It Introduces the following facets:
webColumn: window column where to show the property.
webRow: window row where to show the prperty.
webDescriptionProperty: is a flag to mark properties that are part of the
description of the corresponding object and are shown in the headers, etc..
webIdProperty: is a flag to mark prpperties that constitutes the Id of the
correspondig object.
37
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
38
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
the properties of the corresponding object in a recursively way. For the rest of them, only the
identifier of the object is extracted.
The other part of the module is a small application which access to OWL-DB, after the definition of
connection parameters. It interprets the instances of Data_extraction class and creates a set of
text files, one by each instance of Data_extraction in CSV format, extracting the intended data.
This module goes inside the properties appearing as values of class_properties to extract all the
properties of the corresponding object in a recursively way. For the rest of properties is extracted
their value, which in the case of object properties is the identifier of the object actins as value.
These files can be imported to a conventional relational database to be analyzed.
3.6. Requirements
The hardware and software requirements to run OntoDDB are the following:
Server: a simple PC up to 2Gb of RAM and 40 Mb of hard disc is enough. Does no matter
the operating system but the Java Virtual Machine is needed.
Database server: Tested in MS SQL Server®, Sybase® and Oracle®, no problems are
envisaged with other database management systems.
Application server: Apache Tomcat® 5.5 or higher.
Ontology edition: A PC with Protégé 2000® + OWL-DB Plug-in.
Client: MS Internet Explorer® 6 or higher.
4. Results
In order to evaluate the suitability of OntoDDB we have used it in a real research project. VALID is
a project to gather clinical data from patients affected by Budd-Chiari Syndrome or Portal Venous
Thrombosis in order to advance in the knowledge about these diseases. It is managed and
financed by a French Research group over liver vascular diseases (“Centre de Référence des
Maladies Vasculaires Du Foie”).
VALID project comes from a bigger European funded project called EN-Vie [17], which have been
using a traditional architecture for its information system. Although some less data are intended
to be gathered in VALID with regard to EN-Vie, the functional requirements are very similar. This
fact allows us to compare the behavior of both systems.
To store the more of 300 different items defined in VALID should require to build a traditional
database with around 40 tables. With OntoDDB, all data model and storage requirements were
covered. An ontology consisting in 60 classes represents both the data model and the user
interface. Only some additional work was needed to fulfill the requirement to show on the web
pages several calculated fields, functionality which is not available in this version of OntoDDB.
This project is running in production for a year. Around 75 cases have been introduced in the
system, which means more than 20.000 data, and the first data extraction has been performed
without any special difficulties.
We are using now OntoDDB in a new project, which is nowadays in the testing phase, with the
same good results. In both projects, the flexibility provided by the system allowed us to have
available prototypes since the first moment, which are a very valuable resource in order to work
close to the physicians. Since the very beginning of the project, key users had stuff to work with
and even it was possible to make on-line modifications and check the results immediately.
39
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
5. Discussion
The use of OntoDDB has several advantages. First of all the application development phase
practically disappears, remaining only analysis and design, with available prototypes from the very
beginning. This fact implies a very important drop in costs and time with their consequent savings.
The maintenance is also less expensive not only by the same reason stated before but specially
because the flexibility to make modifications very easily.
On the other hand, as differences between applications are reduced to their conceptual model,
the same infrastructure could be used taken advantage of scale economy, some elements or
models could be reused and homogeneous criteria could be established inside an organization.
They are some more conceptual advantages derived from the use of ontologies and standards.
Making an ontological analysis of an application allows moving the focus of attention to a higher
abstraction level and to concentrate on the domain aspects, helping the researchers to clarify the
implicit knowledge structure. The use of standards, like OWL, makes easier the interchange and
reuse of models.
The metamodel of OntoDDB has no capability for process representation and is not possible for
the moment to manage explicit knowledge related to processes. This weakness was well
illustrated with the impossibility to represent calculated fields.
OntoDDB has been only used in the clinical environment to the moment but the model is totally
independent of the domain, so it would be suitable to gather data in whatever context.
6. Future work
The first version of OntoDDB has served as a concept proof, it has allow to demonstrate that is
possible to detach the knowledge from the information when building a system.
Actual working in progress deals to go in deep to make knowledge explicit in order to split more
and more the data model from the presentation layer to incorporate more web functionalities and
to incorporate processes in the future.
Using this tool as a base for other kind of applications is envisioned too, as we consider that
OntoDDB has a more general range of possibilities than gathering data. In particular we are now
working to build power and versatile knowledge servers able to cope with medical knowledge.
7. Conclusions
OntoDDB is a tool that allows modelling some aspects of the reality and to automatically create a
database from it to collect data. But still more interesting is that the knowledge to be gathered in
the database remains very well documented.
To have this functionality available allows sharing data in a very efficient way and allows us to
explore new ways for knowledge management and sharing.
Acknowledgements
This research was partially supported by the Centre de Référence des Maladies Vasculaires Du
Foie, Paris.
40
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
References
[1] Ball MJ, Silva JS, Bierstock S, Douglas JV, Norcio AF, Chakraborty J, et al. Failure to Provide
Clinicians Useful IT Systems: Opportunities to Leapfrog Current technologies. Methods Inf Med
47:4-7 (2008)
[2] Lehmann CU, Altuwaijri MM, Li YC, Ball MJ, Haux R. Translational Research in Medical
Informatics or from Theory to Practice. Methods Inf Med;47:1-3 (2008).
[3] Shepherd M. Challenges in Health Informatics. Proc 40 HICSS;1-10 (2007).
[4] Smith B, Ceusters W. Ontology as the Core Discipline of Biomedical Informatics. Legacies of the
Past and Recommendations for the Future Direction of Research. Comput Ph Cognit Science;1-14
(2005).
[5] Knublauch H. Ontology-Driven Software Development in the Context of the Semantic Web: An
Example Scenario with Protégé/OWL. Stanford University; (2004).
[6] Boella G, van der Torre L, Verhagen H. Roles, an interdisciplinary perspective. Applied
Ontology;2[2]:81-8 (2007).
[7] Ferrario R, Prévot L. Formal ontologies for communicating agents. Applied Ontology;2[3-4]:209-
15 (2007).
[8] Bodenreider O, Burgun A. Biomedical ontologies. Medical informatics: Advances in knowledge
management and data mining in biomedicine.p. 1-25 (2005).
[9] http://www.ihtsdo.org/
[10] Muñoz A, Somolinos R, Pascual M, Fragua JA, Gonzalez MA, Monteagudo JL, et al. Proof-of-
concept Design and Development of an EN 13606-based Electronic Health Care Record Service. J
Am Med Inform Assoc Oct 26;14[1]:118-29 (2006).
[11] Asenjo MA, Bertrán MJ, Guinovart C, Llach M, Prat A, Trilla A Analysis of Spanish hospital's
reputation: relationship with their scientific production in different subspecialities Med Clin (Barc).
2006 May 27;126(20):768-70.
[12] http://protege.stanford.edu/
[13] http://jena.sourceforge.net/
[14] OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-features/
[15] Anhoj J. Generic Design of Web-Based Clinical Databases. Journal of Medical Internet
Research Oct;5[4] (2003).
[16] Stephen B, Johnson PhD, Paul T, Khenina A. Generic database design for patient
management information. p. 22-6 (1997).
[17] http://www.biocompetence.eu/index.php/kb_1/io_3122/io.html
41
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
Biomedical information processing related to medical records and clinical notes is a complex task due to
nature of the documents (hand-written and semi-structured or non-structured data) and the diversity of
terminology used. There are some technologies that rely on some standards to deal with this kind of
data in English language, however, in the case of Spanish Language there are only few initiatives. The
following paper briefly describe a tool to map Spanish medical terminology over the meta-thesaurus
SNOMED CT, in addition tool performance and architectural features will be addressed, and a short
assessment will be achieved.
Keywords: semantic tagging, meta-thesaurus, SNOMED.
1. Foreword
Medical text processing is one of the most interesting areas in the last few years, due to several
issues: first of all, the huge amount of scientific papers yielded, next to the need of use automatic
tools to manage and search this documentation and the complexity to process different type of
information drew up by domain specialist. In most scenarios, such documentation consists of
several records which are composed by no-structured data, usually this data is manually created
(this fact might lead to achieve some orthographic mistakes) and also is not fulfilling any naming
convention related to concepts or acronyms transcription. Also, personal information is included
in most medical records, this fact causes a security hole because private information related to
patients or specialist might be revealed.
On the one hand concerning English clinical notes processing, there are several tools and meta-
thesaurus such as Mesh and UMLS [1], which are performing well, on the other hand, there is a
lack of similar tools focused on other languages such as Spanish. ISSE project (FIT-350300-2007-
75), provides a SNOMED based tool which attempts to recognize concepts within the meta-
thesaurus SNOMED Clinical Terms, those concepts belongs to several Spanish written clinical
notes.
The following sections, addresses some other related work, a brief description of the tool, the tool
assessment and various conclusions and proposed works.
2. Related works
Medical information technologies focused on clinical notes processing, trying to figure out new
treatments and drugs. For that purpose, several disciplines such as Computer Science, Linguistic,
42
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Biomedicine, Genetic, etc, should join to develop management and search applications. Those
applications should incorporate new medical resources; during this stage one of the most
important steps is document semantic tagging which is mandatory to reach the following stages.
First step within documents tagging, consist of terms identification or recognition, afterwards
these terms should be matched against the meta-thesaurus. System performance depends on
linguistic processing efficiency and also depends on coverage and quality of the proposed
thesaurus. Using thesaurus such as SNOMED3 or UMLS [3] and [4], which are considered as
standards, it is possible to assure the quality which them provides to multilingual semantic
networks. There are several approaches which justify using such thesaurus, for example those
thesaurus provides wider coverage than others like GALEN or MeSH [5]. However in spite of its
pure advantages, these terminologies do not cover all languages, this fact is against all non-English
speakers which must achieve their own terminologies, in order to profit similar tools [6].
Biomedical domain records are written by human specialist who accomplished huge amount of
faults due to, the use of symbols which might have several meanings and the use of non-
normalized terms. Hence it is necessary to add new resources such as spellcheckers and acronym
dictionaries in order to face these troubles [7], [8].
3. Concept recognizer
The concept matching unit is inside a text pre-processing framework. This framework provides a
system which is in charge of retrieving semantic information from a set of clinical notes which acts
as a system input. The described framework matches a set of sentences belonging to clinical notes
against the thesaurus SNOMED. This unit attempts to identify all terms in the input sentences
which are in the thesaurus, during this task, the system also provides a synonymous and related
terms recognition. The SNOMED concept recognizer performs quite similar to other tools like
Metamap4 which provides concept recognition over UMLS, which means English concept
matching. However our concept recognizer works over Spanish section of SNOMED, this is the
main difference between the new concept recognizer and Metamap.
Taking into account SNOMED storage, two solutions were proposed, first one is an index based
solution, which use Lucene5 indexes to access SNOMED, thus SNOMED access has been improved,
due to inverted index which provides pretty good response times. Indexes have been built in order
to allow querying several fields of SNOMED description table, such as Term, conceptId, etc. On the
other hand a MySql database developed by software development company Isoco 6 was proposed,
this database includes information of the three tables of SNOMED, thus this solution provides
wider coverage but worse response times than Lucene indexes.
Up to this moment, the score formula will be explained in order to provide knowledge about
system performance and concepts categorization, the formula is based on the one proposed by
Patrick, J., Wang, Y. and Bud, P. [9], the formula proposed has been modified in order to fit new
requirements. The proposed formula is depicted next;
3
SNOMED: The systematized nomenclature of medicine. http://www.snomed.org/
4
Metamap: http://mmtx.nlm.nih.gov/
5
Apache Lucene: http://lucene.apache.org/java/docs/index.html
6
ISOCO: http://www.isoco.com/
43
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
2
Score
length (Q) * length ( R)
Where:
matches Q Query R Re trieved
Thus the score takes into account both the length of the query and the length of the retrieved
concept.
Henceforth an example related to system score performance will be depicted;
Suppose the following query:
Q “Bacterial pneumonia”
It is important to emphasize that the length of the retrieved sentence is 4 because “The” is
considered as a stop word and it is ignored by the system.
The final score basing on the previously detailed data is:
Score 4/8 = 0,5
4. Recognizer assessment
System assessment was achieved over a set of 100 clinical notes, previously hand-tagged by a
specialist, this set is considered as a Gold-Standard and afterwards this set of clinical notes is
compared with the results yielded by the recognizer in order to assess the system performance.
Due to SNOMED spread, only two hierarchies within SNOMED were considered during this
process, these hierarchies are “procedures” and “disruptions” which were considered most
relevant by specialists.
The milestones taken into account during evaluation were:
Acceptation threshold: Symbolize the minimum score that a retrieved concept should have in
order to be considered relevant.
Amount of retrieved concepts: Means the number of concepts which will be retrieved for each
query.
Also several evaluation functions were proved in order to figure out which one fits better with the
Gold-Standard.
In order to achieve a complete evaluation of the tool both, complete matching and partial
matching techniques were proved. Partial matching consists of splitting the sentenced retrieved in
three parts, the left one, the center one and the right one. Once this process has been
44
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
accomplished, the system checks whether some part matches the query and if it succeed, the
sentenced is considered as relevant for this query.
Hence for each evaluation function six experiments were achieved, the high points of these
experiments were, retrieve 1, 2 and 5 concepts per query and for each one set a threshold rate of
0.2 and 0.4. Finally two approaches were achieved, one performing complete matching and other
performing partial matching, at the end coverage and precision rates were estimated as depicted
in Table 1.
The results yielded shows how precision and coverage experience a slightly improvement when
achieving partial matching, thus the precision rate for complete matching is set to 0.4 and
coverage rate reaches 0.08, however using partial matching, both coverage and precision rates
are higher.
Table 1. Assessment results
Partial matching Complete Disruptions Procedures
matching
Precision 72% 43% 70% 35%
Coverage 9% 6% 5,5% 7%
Taking into account the previously explained results, the results are slightly better when using a
partial matching technique; also the system performs better when retrieving disruptions instead
of procedures. This fact happens maybe due to system architecture, but also due to Gold-Standard
file reliability which might be better for disruptions than for procedures. Analyzing the table
above, several conclusions may be drawn, first of all, the precision rates are good enough,
especially when performing partial matching techniques, but the real problem arises when taking
a glance to the coverage rates which are lower than expected and always lower than 10 per cent.
Thus several analysis in order to improve those rates should be done, there are several ways to
analyze and improve the results, first of all analyze system behavior and check whether it is
performing as expected and other related to check Gold-Standard reliability which should be done
together with domain experts.
5. Future research work
In order to refine the system to get better results, the future works include, to build a repository
of medical resources which basing on dictionaries and ontology belonging to medical domain,
allows term recognition even when those terms are not included on SNOMED, but they are related
to other terms of the thesaurus.
And last but not less important, it would be really useful to extend the Gold-Standard and also the
corpus scope to reach higher reliability while results verification.
The main purpose of the whole work is to establish new semantic relationships which allow
knowledge retrieving and inferring a new one. Following this trend all future works mentioned
before are crucial in order to reach our goals.
45
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
6. References
[1] Ananiadou, S. and McNaught, J. Text Mining for Biology and Biomedicine. Artech House, Inc.
(2006).
[2] Vintar, P. Buitelaar, M. Volk, Semantic relations in concept-based cross-language medical
information retrieval, in: Proceedings of the Workshop on Adaptive Text Extraction and Mining,
Cavtat-Dubrovnik (2003).
[3] Volk M., Ripplinger B., Vintar, S., Buitelaar, P., Raileanu, D., Sacaleanu, B. Semantic annotation
for concept-based cross-language medical information retrieval. International Journal of Medical
Informatics; 67(1): 97-112 (2002).
[4] Jang, H., Song S. K., Myaeng, S. H. Semantic Tagging for Medical Knowledge Tracking.
Proceedings of the 28th IEEE EMBS Annual International Conference. New York City, USA, Aug 30-
Sept 3 (2006).
[5] Ruch, P., Wagner, J., Bouillon, P., Baud, R., Rassinoux, A.-M., Robert, G. Medtag: Tag-like
semantics for medical document indexing. In Proceedings of AMIA'99, p. 35-- 42 (1999).
[6] Lu, W-H., Lin, R., Chan, Y-CH, Chen, K-H. Overcoming Terminology Barrier Using Web Resources
for Cross-Language Medical Information Retrieval. AMIA Annu Symp Proc.; 519–523 (2006).
[7] Schuler, K., Kaggal, V., Masanz, J., Ogren, P., Savova, G.. System Evaluation on a Named Entity
Corpus from Clinical Notes. In Proceedings of the Sixth International Language Resources and
Evaluation (LREC'08) (2008).
[8] Ogren, P., Savova, G., Chute, Ch. Constructing Evaluation Corpora for Automated Clinical
Named Entity Recognition. In Proceedings of the Sixth International Language Resources and
Evaluation (LREC'08) (2008).
[9] Patrick, J., Wang, Y., Bud, P. An Automated System for Conversion of Clinical Notes into
SNOMED Clinical Terminology. Proceeding of the fifth Australasian symposium on ACSW frontiers;
68: 219-226 (2007)
46
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
47
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
Ontological representation of the health domain is widespread. In particular the semantic description of
drugs is being tackled in several ongoing initiatives. However, the resultant ontologies tend to be large
and unmanageable. In recent years recommendations go towards the use of smaller, dynamic and
interlinked ontologies to ease the ontology life-cycle. So far there have been attempts to build
ontologies using this approach, but with very few methodological and tooling support. In this paper we
propose to apply the notion of networked ontologies, methodology and tools developed within the
NeOn project.
Keywords: Ontologies, networked, mappings, tools, knowledge management, nomenclature,
interoperability
1. Introduction
In recent years, there has been an increasing interest in semantic interoperability in e-Health.
Semantic interoperability is about sharing and combining data and health records among different
systems and actors. It is also related to foster a consistent usage of the terminology (drugs and
bio-medical knowledge bases), and the adoption of shared and standard models of clinical data. In
short, semantic interoperability goes to the underlying objective of formalizing the health science
using shared or linkable models.
One of the key aspects to tackle in order to achieve semantic interoperability is the usage of
common or interoperable terminologies about drugs, diseases, treatments and so on. Different
actors (governmental bodies, hospitals, labs, key industries, etc.) should be able to understand the
terminology used by others. To complicate matters, it is quite common that different systems in
the same organization do not use the same terminology. In order to overcome this problem, over
the past years numerous initiatives, roadmaps and emerging standards have seen an increasingly
rapid development. SNOMED-CT [1] is emerging as a de-facto terminological standard for many
international initiatives. Examples of this are the information models adopted in Australia (NEHTA)
[2], UK (NHS dm+d) [3] or USA. The W3C created in 2008 the Semantic Web Health Care and Life
Sciences (HCLS) Interest Group [4]. The EU financed the SemanticHEALTH FP6 project [5] with the
objective of delivering a Semantic Interoperability roadmap for Europe.
In particular, SemanticHEALTH issues recommendations such as interlinking health models and
terminologies by means of modular, multilingual, dynamic (just-intime), collaboratively-designed
networks of ontologies [6] [7]. These recommendations also stress the methodological support
needed to specify highquality, consistent and scalable ontologies. It also recommends the use of
the W3C standard ontology language, OWL [8], because a large and growing community is
Corresponding author: Tomás Pariente Lobo, ATOS Research and Innovation, ATOS Origin
SAE, 28037 Madrid, Spain. E-mail: tomas.parientelobo@atosresearch.eu
48
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
developing tools and software (in many occasions freely available) that will benefit the integration
and maintenance of ontologies based in this language.
The tooling and methodological support needed to foster the adoption of interoperable solutions,
especially when talking about bridging the gap between huge terminologies, have not followed
such a rapid evolution. There are partial solutions that tackle one or several of the issues raised by
SemanticHEALTH. However, far too little attention has been paid to the delivery of an overall
framework that covers most of the recommendations cited above.
2. The NeOn approach
NeOn [9] is a FP6 EU ICT funded project which aim is to create an open infrastructure, and
associated methodology, to support the overall development lifecycle of large scale, complex,
semantic applications. This infrastructure is based on the notion of networked ontologies. A
network of ontologies is a collection of ontologies related together via a variety of different
relationships such as mapping, modularization, version, and dependency relationships [10]. NeOn
define four main ontology assumptions: Dynamic (ontologies will evolve), Networking (ontologies
are interconnected via mappings, alignments or by means of reuse), Shared (ontologies are shared
by people and applications), and Contextualized (ontologies are dependent of the context in which
are built or are used) [11].
NeOn has defined a service-based reference architecture that covers design and runtime aspects
of ontology engineering, plus the usage and integration of the networked ontologies into
semantic-enabled applications.
49
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
50
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
The methodological approach is twofold: on the one hand it provides ontology engineering
support, and on the other hand, the methodology provides also guidance to develop applications
using networked ontologies. The usage of publically available Ontology Design Patterns [14], in
order to improve the quality of the ontology design in a variety of scenarios and needs, is also one
of the most relevant outcomes of the project.
It is clear the alignment of the NeOn objectives and the recommendations issued by
SemanticHEALTH. However, NeOn is not targeting specifically the e-Health domain, but taking a
horizontal approach valid for multiple domains. However, one of the NeOn case studies is focused
on the pharmaceutical domain. In particular, the pilot targeting the interoperability between
different drugs terminologies is the socalled Semantic Nomenclature case study.
The Semantic Nomenclature case study is trying to pave the way towards the use of a network of
ontologies to relate different drug terminologies. It defines an ontology network where each actor
potentially plugs in its own model as ontology. All ontologies are interconnected and mapped
between them to share information.
51
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
52
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
53
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
[4]Semantic Web Health Care and Life Sciences (HCLS) Interest Group,
http://www.w3.org/2001/sw/hcls/
[5] SemanticHEALTH project, http://www.semantichealth.org/
[6] SemanticHEALTH partners. Semantic Interoperability Deployment and Research Roadmap.
SemanticHEALTH SSA project Deliverable D7.1, 2008
[7] Rector A. Barriers, approaches and research priorities for integrating biomedical ontologies.
SemanticHEALTH SSA project Deliverable D6.1, 2008.
[8] Web Ontology Language (OWL), http://www.w3.org/2004/OWL/
[9] NeOn Project, http://www.neon-project.org/
[10] Haase P, Rudolph S, Wang Y, Brockmans S, 2006. Networked Ontology Model. NeOn
Deliverable D1.1.1
[11] Sabou M. et al, 2006. NeOn Requirements and Vision Deliverable
[12] Waterfeld W, Erdmann M, Schweitzer T, Haase P. Specification of NeOn architecture and API
V2. NeOn Deliverable D6.9.1, 2008
[13] NeOn Toolkit website http://www.neon-toolkit.org/
[14] Ontology Design Patterns, http://ontologydesignpatterns.org
[15] Herrero G, Pariente T. Revision of ontologies for Semantic Nomenclature: pharmaceutical
networked ontologies. NeOn Deliverable D8.3.2, 2008
54
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
55
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
During last years great efforts have been made on Healthcare systems computerization. Those efforts
are a great leap on both quantitative and qualitative patient care. However, nearly all of current
developed systems are still being built ad-hoc for each organization. This makes the communication
between organizations a time and money consuming task. In this document we present a platform for
standardized Electronic Health Record (EHR) communication based on software agents that provides
many functionalities to standard EHR communication.
Keywords: Electronic Health Records, software agents, standardization.
1. Introduction
The EHR communication in a semantic interoperable way is still a “work in progress” problem. In
the best case scenario the developed solutions achieve to communicate a limited number of
systems. That solution cannot be exported to other similar problematic (For example, a solution
built for a hospital cannot be exported to another one). Even worse, changing anything on any of
the integrated systems entails changes on the remaining systems if they want to access to the
information of the former. This maintenance is time and money consuming, which makes it very
complex.
Due to those problems, new models to the EHR communication based on a dual model approach
[1] have been developed (norms like ISO 13606 and openEHR). The dual model for EHR
communication is based on the separation between information (the data) and knowledge (that
changes and improves in time). This knowledge is represented on the ISO 13606 and the openEHR
standards as archetypes (formal descriptions of the domain concepts). Those archetypes can (and
should) be defined by domain experts, who know which concepts the system uses.
A software agent standard platform based on those dual models has been developed to solve this
problem. The platform can be used to access to standardized and non-standardized data sources.
In last case, LinkEHR tool is used to generate the EHR standard extracts using the chosen reference
model [2].
In this work we describe the architecture of a dual model based health information system and
prove how the shown architecture of software agents achieves the integration of distributed EHR
systems.
2. Background
Software agents were chosen as the technology of the platform because they have solved the
integration problem in several health scenarios [3-4]. However, until now this integration has been
made ad hoc to the involved systems, making the integration result still not shareable in an
56
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
interoperable way. This is an essential aspect on the developed system, as sharing health records
between different systems is currently a need.
Despite that, use of standards in healthcare is still at initial stages. With some exceptions (as can
be DICOM *5+, which is the de facto standard for image storage and communication) there isn’t a
public awareness on public administrations and healthcare professionals yet [6]. One of the
needed steps for the improvement and evolution of current systems is the EHR use of standards.
Although dual model standards are being used in production systems in Australia, Holland and
England [7], the dual model standards field is still unknown for a large number of professionals.
2. Methods
The developed multi-agent system is an EHR distributed system based on software agents. The
platform allows to standardize all the systems so tFor the implementation and test the JADE
platform was used. (http://jade.tilab.com/).
The proposed agent system allows searching and retrieving of standardized extracts from
distributed data sources. We created an ontology based on the knowledge of a dual model based
clinical system (archetypes, EHR extracts, patients, etc.). This ontology has allowed us to define
four different agent roles for the platform: The archetype repository agent, the EHR agent, the
register agent and the user agent.
The archetype repository agent: Is an agent assigned to each one of the known archetype
repository. This agent has an XML database [8] to allow queries to the archetypes. Those
queries allow:
o Store archetypes up in the repository (or a full load of the repository)
o Update, change version or mark as deleted archetypes
o Query the database to get archetypes by their Id (archetype name), organization
that has created it (CEN, openEHR, CCR, etc.), archetype description (institution,
author, date, status, etc.), textual content of the archetype, archetype paths and
the ontology section of the archetype.
All those queries are applicable to the “integration archetypes” developed by the IBIME group *9+,
which are archetypes mapped to a data source from where we want to extract information in a
standardized way. This agent allows the access to archetypes and its mappings.
The defined queries allow the clinical domain expert to search for the most suitable archetype for
the health record query. Ontology rules can be applied to the archetype search as the ontology
bindings of the archetype can be easily obtained.
The EHR agent: Is the agent assigned to each one of the health record data sources. The
main function of this agent is the generation of the standardized EHR extracts from the
available data sources of the system. Those data sources can be non-standardized data
sources. In that case the agent must preprocess the existing information to standardize it.
57
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
This preprocess is performed by the tool LinkEHR, which allows an easy way to access
existing health data by using integration archetypes. This process consists of mapping of
the data sources to the archetypes used by the system [9]. The mapping definition does
not need to be specified if the source can already provide standardized EHR extracts. In
that case the agent will only fill the headers of the extract wrapping the data and transfer
it.
The register agent: This agent holds the information of the systems where a patient has
parts of his health information (clinical information index). This agent also unifies all the
received EHR extracts in a single one. That EHR extract is then transferred to the user
agent. The clinical census is a basic service for every system that has the information
scattered through several sources. For example, on the project of the electronic health
records of the health national service (in Spanish, HCDSNS), a clinical information index will
be created. This service will show in which regional systems a patient has part of his EHR.
The user agent: These agents are assigned to each one of the users of the system. The
function of this agent is to be the gateway between the users and the system
(understanding users as both the ones that access directly through a graphical user (or
web) interface and the software programs that access the system through web services).
These agents are able to ask the repository agents to obtain the most suitable archetype
for the EHR query. Once the user knows the archetype identifier of the concept he can
query the register agent to obtain the EHR extract from the complete EHR of the desired
patients. Additionally, this agent should provide the authentication mechanisms and the
needed certificates to assure the security of the system.
58
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
The user agent is the access point to the system. User agent can obtain standardized extracts and
ask about the available archetypes.
To obtain the EHR extract, first user agent queries the Register agent with a patient Id and an
archetype identifier. Then Register agent asks the clinical census for the places where a patient
has parts of his health information. Next Register agent queries those EHR agents over each one of
the data sources. Then EHR agents build the standardized EHR extract for the data source. And
finally, those EHR extracts are unified by the Register agent and then returned to the User agent as
a single extract.
To query the archetypes the queries are built on the User agent. The queries will be sent to the
Repository agent which will then return the archetype or the archetype list to the User agent.
Result subscription agents: In a similar way to the last one, agents could be designed to
inform when a result is ready to be served.
Acknowledgments
This work has been funded by project TSI2007-66575-C02-01 from Ministerio de Educación y
Ciencia, the Consellería d’Empresa, Universitat i Ciencia, reference APOSTD/2007/055 and the
Programa de Apoyo a la Investigación y Desarrollo (PAID-06-07) from the Universidad Politécnica
de Valencia.
References
[1] T. Beale. Archetypes, Constraint-based Domain Models for Future-proof Information Systems.
(2001) http://www.deepthought.com.au/it/archetypes/archetypes.pdf
[2] JA. Maldonado, D. Moner, D. Boscá, C. Angulo, M. Robles, JT. Fernández. Framework for clinical
data standardization based on archetypes. Stud. Health Technol. Inform., 454-8, (2007)
[3] G. Lanzola, A framework for building cooperative software agents in medical applications.
Artif. Int. Med., Volume 16, Issue 3, Pag 223-249 (1999)
[4] D. Isern, D. Sánchez Moreno, A. Valls, HeCaSe: an agent-based system to provide personalized
medical services. CAEPIA (2003)
[5] DICOM: Digital Imaging and Communications in Medicine http://medical.nema.org/
[6] ICT standards in the health sector: current situation and prospects
http://www.ebusiness-watch.org/studies/special_topics/2007/eHealthStandards.htm
59
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
[7] M. Al-Ubaydli. Open source medical records systems around the world. UKHIT 49 (2006)
[8] eXist – Open Source Native XML Database http://exist.sourceforge.net/
[9] Bosca D, Moner D, Maldonado JA, Angulo C, Robles M. LinkEHR: a tool for standarization and
integration of legacy clinical data. Proceedings of the 12th World Congress on Health Informatics
(MedInfo'07).
60
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
61
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
This paper describes the CARDEA platform. Main technologies are described and how they are been
used for this platform. CARDEA architecture deployed in Hospital Gregorio Marañon is shown and the
expected results of the pilot.
Keywords: monitoring, hospital, medicines, patients, RFID, OSGi, ESB, SIP
1. Introduction
A hospital enclosure is usually a complex environment with a huge number of rooms normally
arranged in different and connected buildings. This situation means a lot of obstacles for the
transmission of the radio signals. Therefore, solid strategies and redundant measures of position,
and radio with good technical skills to cross materials must be used in such environment.
A typical hospital service could be composed by scattered units physically and logistically, that
serve an important number of patients with a very high hospital rotation and in where many
resources of the hospital staff and assets (medicines, material and equipment) are needed to be
managed, which in many cases are residing outside their own service.
The CARDEA Project, in order to allow the integration of new multimedia service generation in a
uniform way into the hospital environment, try to research the definition and development of a
service platform for hospital monitoring based on different standards, which in turn allows third
parties to deploy services on a standard, safe and controlled way, without having to invest in
proprietary solutions and difficult integrations.
2. Technologies
As explained before, this project has been developed using four different technologies:
The global element in this environment is the Framework. For this framework we have
used the OSGi technology.
The use of RFID tags allows the platform to capture information about medicaments such
as quantity, name, composition, expiry date, etc.
The information captured is processed en the platform and it will generate alarms and
error messages that will be sent to mobile nodes. For this functionality we have used the
SIP technology by integrating SIP elements into the OSGi Framework.
CARDEA is context-aware. The situation of people considered as actors in the hospital
together with assets like expensive medicines are considered as elements with a dynamical
situation. A Semantic Web based ontology managed with Jena, a knowledge rules API
62
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
integrating Pellet and a middleware for capturing, storing and delivering context
information is integrated. This subsystem is called OCP (Open Context Platform).
For the correct understanding of how this system works it’s necessary to explain the main features
of the technologies involved: OSGi, RFID, SIP and OCP
OSGi
The OSGi platform (Open Service Gateway initiative) was defined by the international association
OSGi Alliance. Its main objective was to define open software specifications for designing
compatible platforms that were able to provide multiple services. OSGi defines an extremely
efficient infrastructure for designing service based applications inside a Java Virtual Machine (JVM)
and provides a development environment running on Java and 100% compatible with J2ME.
The main part of the infrastructure is the Framework that implements a dynamic component-
based model. With this model the mentioned environment is able to manage the applications
installed in the framework in a dynamic way. The applications (named bundles) can be installed,
started, stopped, uninstalled and updated remotely without needing to restart neither the device
nor the framework.
The key element in the OSGi framework is the component or bundle. Every bundle can consume
services provided by other bundles and can provide another services at the same time. This
process can be explained below:
A bundle can register and unregister services in the container in a dynamic way. For
obtaining interaction this bundle must register a service interface and a class that
implements this interface. Every change in the service (register, change, unregister) will
produce some events that will be captured and processed by the framework.
If another bundle wants to use the service registered by the previous bundle it will ask the
platform for the service reference. When it obtains the service reference thought the OSGi
platform, the service is ready to be used. The consumer bundle will be able to call any
method of the service which implementation was registered by the provider bundle.
The services will be available and registered if the bundles that implements them are installed in
the platform.
RFID
RFID (Radio-frequency identification) is an automatic identification method, relying on storing and
remotely retrieving data using devices called RFID tags or transponders. The RFID technology is
used to describe a communications system that transmits the information located in a RFID tag via
Wireless. The technology requires some extent of cooperation of an RFID reader and an RFID tag.
An RFID tag contains the identity of an object that can be applied to a product, resource, person,
etc. A RFID tag can be divided into two parts: integrated circuit for storing and processing
information and for modulating and demodulating the radio-frequency signal, and antenna for
receiving and transmitting the signal.
The main properties of the RFID technology for this project are:the RFID technology permits the
system to store information about the products marked with RFID tags; RFID tags can be read
from several meters away and beyond the line of sight of the reader; RFID tags have become very
cheap nowadays with a unitary cost of a few cents of euro.
63
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
In this project we have used passive RFID tags which have no battery instead of active RFID tags
which need to contain a battery to be located by the reader.
SIP
The SIP protocol (Session Initiation Protocol) is a application-level signaling protocol defined by the
IETF (Internet Engineering Task Force) with the RFC 3261 4. The aim of the IETF is that the SIP
protocol becomes the standard for the initiation, modification and finalization of interactive
session which participate multimedia elements such as video, voice, instant messaging, online
game, etc.
SIP supports device mobility, location independence and has robust security specifications. The
main feature of SIP and the one that made us choose SIP in the project is that it can resolve
addresses by the use of URI’s (Uniform Resource Identifiers) 4. With these addresses we can
determine the physical address of the user in any time and also the IP address of the device that is
used. Any user can establish a communication with another user only knowing the information
about the URI identifier. Other features of SIP are: session negotiation, call management,
modification of the features of a established session and possibility of updating the protocol with
extensions
OCP
OCP is a middleware which allows services and applications in a service oriented architecture to be
context-aware. Context-aware computing is a recently emerged paradigm which allows to adapt
to changes in the environment. Adaptation is done by services and applications by using up to
date information about the state of end-users (i.e. patients and hospitalary personnel in the case
of CARDEA).
In OCP environments, there are two main roles for software entities engaged in them. The first
one is the context producer. A context producer is a dynamic entity, usually a software entity
related to a concrete person, which changes as the user changes her state. For example, a CARDEA
client running on a hand held device will change its current location within the hospital when the
corresponding nurse carrying the device changes her location also. The second role is the context
consumer. Usually, a consumer of context information is a service or application that consumes
such information to adapt its behaviour. For example, a different interface for the CARDEA
application is displayed depending on the device that the user is working with (e.g. PDA, laptop,
etc.).
Users, devices, physical environment, all these information is represented in an ontology domain.
This representation is based on OWL and hosted and managed by Jena. Among the main
advantages of such representation we have a common and shared representation of the domain
for all the CARDEA elements, managed by OCP.
4. Global Architecture of CARDEA
The diagram of Fig. 2 describes the architecture by levels and identifies the subsystems involved
and the relationship between them.
64
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
The global element in the infrastructure is the OSGi node. It deploys the basic elements of CARDEA
that are, the Contextual Information Management System (OCP), the service that allows the
interaction multi-device (SIP) and the services of hospitals and laboratories thought the Mule
module. We can see different layers; the bottom layer supports the operative system, with a Java
virtual machine installed. Running on the virtual machine we can see the OSGi framework, which is
100% Java. The rest of the items placed on the upper layer are the corresponding OSGi bundles
which register services in the platform that can be used by other applications.
One example of the services built based on this architecture is the “OCP - Contextual Information
Management System”, formed by the following components:
Jena 4: Open Source Semantic Web Framework for Java. It provides a mechanism to
manage ontology in OWL format, extract the data and store it in the database and obtain
results thought its inference engine.
OCP Service: Main part of the OCP System. It creates context objects with the information
provided by other agents and manages them. It communicates with Jena in order to make
it persistent.
RFID: Acts as a RFID server. This service receives the information that was read by the RFID
readers and transmits it to the OCP service so that it can be transform into context
information.
SIP: The SIP service is used to establish communication between the OSGi framework and
some external elements (mobile phones, PDAs, etc.). The SIP service is provided by a
bundle installed in the platform, so every bundle can use it in order to send messages or
make SIP calls.
ESB: This element is a middleware based on synchronous and asynchronous messaging
that provides secure interoperability between applications using XML. This middleware
allows business applications to communicate each other. This service make logical
decisions using the information provided by the OCP service and use the SIP bundle for
accessing to external entities.
65
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
1. When a medicine tagged with RFID crosses the system of RFID readers, the information of
the tag is transmitted to the RFID service in the OSGi platform.
2. This information contains the type of the medicine, the expiry date, the current time and
the identification of the tag. When calling to the OCP service these attributes that define a
medicine are transformed into a context object called “Medicine” and managed by the OCP
Service.
3. The OCP Service stores this object into a database thought the Jena service, so the
database contains all the object references generated in the OCP bundle.
4. The ESB service is subscribed to the context notifications generated by new Events in the
OCP bundle. This subscription allows the ESB service to be notified when a new medicine
object arrives. The ESB service can determine the stock of a certain medicament and act in
consequence.
5. Validation at Gregorio Marañon Hospital
The CARDEA platform is being tested in a real scenario by means of a pilot deployed in Gregorio
Marañon Hospital. This scenario has the following elements:
CARDEA platform. The CARDEA platform was installed in a server located at Gregorio
Marañon facilities.
RFID tags. These tags are sticked on the medicine unit. (Fig 2-b)
RFID sensors (Fig 2-a). Two groups of RFID sensors are installed in Gregorio Marañon
Hospital facilities.
RFID PDA reader (Fig 2-b). This reader registers the tags sticked on medicine units and
inserts them into CARDEA platform. Besides the registration, the reader allows to
get/modify the information registered and check the localization of the medicine units
readers.
Console of activity. This console allows viewing the activity registered by CARDEA platform.
Basically, it receives all the notifications sent by CARDEA ESB module.
SIP agent on device emulator. A SIP agent was installed in a mobile device emulator. This
agent receives alarm notifications from CARDEA SIP module of different alarms:
Currently the pharmacy service of Gregorio Marañon Hospital is validating CARDEA and the results
will be collected at the end of March.
a) b)
66
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Fig. 2: a) Pharmacy service corridor. .Two RFID sensors are shown, which registers the out of
medicine units. b) Medicine units with RFID tag and the RFID PDA reader.
Acknowledgments
Many people have been involved in the success of CARDEA platform: Alfredo Pedromingo, Eneko
Taberna, Pablo Piñeiro (Ariadna Servicios Informáticos); Francisco López (Murcia University),
Augusto Morales (Universidad Politécnica de Madrid), Carlos Ángel Iglesias (E-Práctica), Dra Ana
Herranz and Dra Arantxa (Pharmacy service in Gregorio Marañon Hospital)
References
[1] OSGi Alliance Home Page, http://www.osgi.org/Main/HomePage
[2] RFID Technology information page, http://en.wikipedia.org/wiki/RFID
[3] SIP information page,http://en.wikipedia.org/wiki/Session_Initiation_Protocol
[4] J. Rosenberg, H. Schulzrinne , G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley y E.
Schooler, "SIP: Session Initiation Protocol", Internet Eng. Task Force RFC 3261, June 2002.
[5] T. Berners-Lee, R. Fielding, U.C. Irvine, L. Masinter, “Uniform Resource Identifiers (URI): Generic
Syntax”, Internet Eng, Task Force RFC 2396. 1998.
[6] Jena project page, http://jena.sourceforge.net/
67
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
Healthcare is inherently process-oriented in the sense that care requires continuous assistance
sustained in time and linked to previous states or events. Process-orientation has two main aspects: the
natural/biological processes people are subject of, and the planned and systematized care processes
devised by professionals and managed by healthcare providers or institutions. The paradigm of Personal
Health Records (PHR) places the personal, subjective view of health as the centre of the data model, and
it must account for biological and care processes if the possibilities of enhancing safety and empowering
the patient are to be maximized. This paper explores the elements required to integrate process models
in personal health record platforms, and the role of ontologies in making process information
actionable.
Keywords: Healthcare process, ontology
1. Introduction
The Merriam-Webster on-line dictionary includes the two following senses for the word “process”:
(a) a natural phenomenon marked by gradual changes that lead toward a particular result, e.g. the
“process of growth” and (b) a series of actions or operations conducing to an end. The former
sense can be considered to include natural processes as pregnancy, but also other biological
processes as illness (be it chronic or transient), disorders or traumatism. The latter can be
interpreted as purposeful actions performed by humans towards an end, this including healthcare
or clinical processes. Obviously, natural or biological processes are in many cases interwoven, as
care processes are typically triggered by natural or biological processes and their steps attempt to
follow and intervene in their evolution. It is widely acknowledged that analyzing and modeling
healthcare processes is significant to improve patient safety (Carstens et al., 2009).
The concept of personal health record (PHR) allows for combining in a single in-formation
technology piece both the personal, subjective view of natural processes and the planned and
systematic course of care processes (Tang et al., 2005). This enables providing services or alerts
based on the analysis of the state of the user in natural processes declared combined with clinical
knowledge represented and asso-ciated to these processes, and informed by the results of tests,
procedures, medica-tions and other care events. This also has a significant application in
anticipating alternate paths in the biological processes by exploiting what is represented in clinical
process ontologies. In the clinical domain, there are several kinds of long-lived processes (i.e.
spanning more than a single session and potentially weeks, months or years), including protocols,
guidelines and preventive programs. Several languages for guideline and protocol modeling have
been developed in the last years, including GLIF, PROForma, the Arden Syntax to name a few.
These are care-oriented in the sense that they are devised for the systematic delivery of care
68
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
events from a healthcare institution perspective. However, when approaching the problem from
the PHR perspective, new opportunities appear from the combination of the knowledge about
several ongoing processes for a given person, and from the possibility of combining them with
purely subjective statements on health not coming from any health record maintainer institution.
The rest of this paper is structured as follows. Section 2 briefly sketches existing technology for
process-oriented healthcare and explores how PHR systems could integrate with them. Then,
Section 3 discusses the role of reasoning about natural and care processes as a mean to increase
patient safety, link to healthcare offerings and improve the (self-tracking) of personal processes.
Finally, conclusions and outlook are provided in Section 4.
2. Representing processes associated to PHR
There are a number of languages specific to clinical computer-interpretable guidelines (CIG) that
are close to general-purpose workflow (orchestration) executable languages as XPDL, XLANG or
BPEL (Mulyar, van der Aalst, and Peleg, 2007).
To make the discussion concrete, we will focus in what follows in PROforma as a well-equipped
process for care guidelines executions (Sutton and Fox, 2003) and in the technology used in the
GoogleHealth PHR. These technologies represent a typical deployment scenario of process-based
multi-source health record with full control of the user. GoogleHealth is currently based on the
ASTM Continuity of Care record (CCR). The CCR was developed to store the most relevant patient
information elec-tronically and make it available to all providers, systems, and patients. An
important aspect of the ASTM CCR is that it is technology neutral which makes it a good candi-
date for PHR (Smolij & Dun, 2006). The CCR can be used as the base information model for a
process representation, and recommendations can be directed either to the user or to healthcare
professionals. Table 1 provides an illustration of the mapping of a fragment of the NICE guideline
CG62 “Antenatal care”7 to CCR ele-ments.
Table 1. Example guideline and mapping to CCR elements
Clinical guideline CCR elements
“Screening for gestational diabetes using Pregnancy:
risk factors is recommended in a healthy FunctionalStatus/Function
population. At the booking appointment, the
following risk factors for gestational [...77386006/SNOMED]
diabetes should be determined: Weight:
− body mass index above 30 kg/m2 VitalSigns/Result/Test
− previous gestational diabetes [...363808001/SNOMED]
− family history of diabetes (first-degree Height:
relative with diabetes)…”
VitalSigns/Result/Test
[...50373000/SNOMED]
Gestational diabetes:
Problems/Problem
[...648.8/ICD9]
NOTE: Family history: requires links to
related CCR profiles.
7
http://www.nice.org.uk/nicemedia/pdf/CG062NICEguideline.pdf
69
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
The example in Table 1 can be mapped to a decision element in PROforma. The decision step is
simply that of recommending screening for gestational diabetes to the individual or the healthcare
provider. In the first case and considering GoogleHealth, this can be realized through a notice (an
Atom Feed with an optional CCR document).
The process execution engine can be implemented as a client application of the PHR that uses data
extraction from the PHR and posting of notices as inputs and outputs. In terms of PROforma the
main task would be mapping PHR information (represented for example in CCR) to different task
elements. For example, the guideline fragment in Table 1 would be a decision with two candidates
(recommending screening or not), choice_model as single, support_mode as
symbolic. Then, the condition that must be true in order for this candidate to be
“recommended” would be in the recommendation property of the candidates, and it would be
expressed in terms of the information extracted from the PHR shown in Table 1.
PROforma plans can be generated to program appointments (from the information contained in
the Pregnancy condition’s start date), for example, the CG62 guideline states that “for a woman
who is nulliparous with an uncomplicated pregnancy, a schedule of 10 appointments should be
adequate”. The different tests included in the guidelines can be arranged in the plan schedule with
CycleConditions for example.
In addition to the care processes typically specified in guideline process models, per-sonal
processes (subjective, biological ones) can be modeled in a loose, episodic way, e.g. occasional
fever self-reporting. For these indications to be useful it is important to have an adequate level of
detail in the data model. Coming back to the CG62 guide-line, it states “All pregnant women
should be made aware of the need to seek imme-diate advice from a healthcare professional if
they experience symptoms of pre-eclampsia. Symptoms include: *…+ severe pain just below the
ribs”. While pain is available as a condition in GoogleHealth, there is no formal way of specifying
loca-tion. The ribcage is represented in the Foundational Model of Anatomy (FMA8) on-tology with
id 7480, so combining that representation with some relative location predicates associated to
symptoms would enable representing that aspect of the guideline. This can be integrated in PHR
by combining ontologies with predicates specific to symptoms and signs.
3. Reasoning in the context of processes
Reasoning and inference require health information to be represented in some for of knowledge
representation formalism. There are existing reports of integrating process models with clinical
care ontologies. For example, Eccher et al. (2005) describe a system combining ontologies with
OpenEHR archetypes in the framework of hearth failure prevention and monitoring. However, the
ontological description reported is limited to classifying processes as biological and non-biological,
being the latter fur-ther categorized as MEDICAL-VISITs and DIAGNOSTIC-INVESTIGATIONs, and
borrowing process semantics from DOLCE (Gangemi et al., 2002). Fox et al. (2006) have described
a comprehensive approach to supporting complex treatment plans and care pathways, focusing on
the expression of goals.
Reasoning processes can be applied on the data and process model described above. It is
important to not that an open world assumption is required, as the absence of data on some
8
http://sig.biostr.washington.edu/projects/fm/
70
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
information piece can not be interpreted as negation. For example, it might be that for the
decision in Table 1 the user has not downloaded or registered data on family history. This fits with
the notion of enquiry tasks in PROforma that are used to get or request (missing) information.
Medication analysis can also be built for conditions. For example, NICE guideline CG22 “Anxiety”
includes the following: “Benzodiazepines are associated with a less good outcome in the long term
and should not be prescribed for the treatment of individuals with panic disorder.*…+.
Benzodiazepines should not usually be used beyond 2–4 weeks”. These decision points can be
implemented with SWRL if the supporting ontologies are using OWL. The following rule can be
used at any time as a way to increase safety:
Patient(?p) and Medication(?m) and taking(?p, ?m)
and current(?m) and active-ingredient(benzodiazepines)
and CurrentCondition(?c) and disorder(?c, panic-disorder)
-> alert(?p, ?m)
The rule above would have been different at diagnosis time, e.g.:
Patient(?p) and CurrentCondition(?c)
and disorder(?c, panic-disorder)
-> recommend-negative(?p, benzodiazepines)
In that second case, the guideline implementation is giving advice to the healthcare professional
(we assume that the negative statement is important as information in the given clinical context).
In both cases, the rules in this case are not deciding the flow of tasks in the process but serving as
alert triggers and aiding in the decision making of one of the steps. The decision steps themselves
can be modeled as SWRL rules in most of the cases, e.g. the simple procedural decision
procedures used in the Arden syntax can be expressed that way. However some guidelines are
fuzzy in nature as “beyond 2-4 weeks” and mechanisms for dealing such vagueness are not
present in ontology-based representations as the combination OWL+SWRL.
Reasoning external to processes also allows to detect the interaction of conflicting care processes.
For example, NICE Guideline GC23 “Depression” states that “When depressive symptoms are
accompanied by anxious symptoms, the first priority should usually be to treat the depression.
Psychological treatment for depression often reduces anxiety, and many antidepressants also have
sedative/anxiolytic effects”. This requires a representation of the process themselves. The
following sketches a possible formulation for the mentioned interaction in the context of a
healthcare event:
This is an example of a meta-process tracking that would integrate with several ongo-ing
processes. This can be realized by having a meta-process ontology that unifies the task status of
the processes (as the results of those tasks can be assumed to be inte-grated in the PHR).
71
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
72
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Smolij, K., Dun, K. (2006) Patient Health Information Management: Searching for the Right Model.
Perspect Health Inf Manag. V3 2006; 3-10.
Sutton, D.R. and Fox, J. (2003) The Syntax and Semantics of the PROforma Guideline Model-ing
Language. J. Am. Med. Inform. Assoc. 10, pp. 433-443.
Tang, P.C., Ash, J.S., Bates, D.W., Overhage, J.M. and Sands, D.Z. (2005) Personal Health Records:
Definitions, Benefits, and Strategies for Overcoming Barriers to Adoption. J. Am. Med. Inform.
Assoc. 13: 121-126.
73
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
The demographic information management of the patients in an information system is usually
considered a secondary problem. This causes that the demographic information is scattered around the
organization or stored along with the clinical information. With the standardization of the clinical
information becoming a popular topic, the standardization of demographic information is even more
important. This paper shows a way of generating standardized demographic repositories from the
different demographics sources available on the system using a standardization process based on a dual
model approach.
Keywords: Demographic, standardization, EHR.
1. Introduction
Demographic data is a key data on any health information system. The value of demographic data
grows up as the systems turn from local to federated systems (as more systems are integrated, is
more likely to get diverse demographic data). Thus, a good definition of the generic demographic
concepts of the system is needed in order to summarize all the requirements. The demographic
data should be stored according to those concepts.
However, nowadays the storage of the demographics data of the system is usually scattered
around all the organizations and systems: The patient related data is stored on a Master Patient
Index or within their clinical data; the healthcare professional demographics are stored on another
different system (usually a LDAP-like server, if they are stored at all); the devices demographic
information is stored on a resources catalog and on the devices itself; and the demographic
information of the organizations is provided by a resource catalog (for example, the catalog of
Spanish health organizations can be found on the Ministry of Health website [1]).
There are also two dual model 9 standards for EHR interoperability (CEN EN13606 and openEHR).
Both define a reference model and a demographic reference model [2], [3]. The instances defined
by the demographics model are used along with the EHR instances defined by the reference model
to create the EHR extract. Figure 1 shows a portion from a demographic instance from a CEN
EN13606 EHR extract.
9
The dual model for EHR communication is based on the separation between information (the data) and knowledge
(what we know about the data). Knowledge changes and improves over time. For further explanation check [4]
74
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
75
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
10
The XSD schemas for CEN EN13606 demographic model and openEHR demographic model can be downloaded at
http://www.linkehr.com
76
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Since demographic openEHR archetypes already exist, the efforts have been aimed to create the
CEN EN13606 demographic archetypes. Archetypes of the basic concepts from the CEN EN13606
demographic model were created. Figure 4 shows an archetype defined in CEN EN13606 standard.
Once the archetypes have been created and the data sources schemas have been imported to
LinkEHR the mapping can be done. The mapping is done by defining functions between the data
sources schemas and the archetype nodes. The full mapping process explanation can be found at
[6]. When the mapping has been defined LinkEHR generates automatically the transformation
script to transform the data available on the data sources into XML demographic instances. These
instances can be stored into a XML capable data source so they can be queried.
The creation of this repository allows the generation of complete EHR extracts as the demographic
OIDs of the data are known on data transformation time. Those OIDs are queried to the
demographic repository and the XML demographic instance results are included into the EHR
standardized extract.
4. Discussion
From the study of CEN EN13606 and openEHR demographic models some differences have been
spotted. First of all, the aim of both demographic models is different: CEN EN13606 provides the
minimal demographic information that should be attached to an EHR extract. The included
demographic information in the EHR extract is enough to retrieve the full demographic
information on the systems. OpenEHR demographic model includes several attributes that can
store tables, lists or trees, which allows the creation of more complex structures on the
demographic section. Comparing to CEN EN1606 demographics, OpenEHR can define a wider set
of structures for demographic information as openEHR model models the full demographic system
and CEN EN13606 standard models the extract demographic information.
Another difference is the separation in CEN EN13606 demographic model of the telecom address
and the postal address. This eases the understanding of the instances as different sets of codes are
used on each one.
Both models define role classes and define or can define the same demographic root classes
(person, patients, devices, organizations, etc.)
5. Conclusion
Currently the demographic data is scattered around all the systems in the organization. There is no
easy way to extract all the available demographics data of the systems. The presented solution
77
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
proves that it is possible to standardize this demographic information using a dual model
approach. LinkEHR can be used to standardize both EHR and the demographic data to generate
standardized EHR extracts. The creation of this repository provides a way of generating a complete
EHR extract from the un-standardized data available on the system.
Each one of the demographic models reviewed on this paper is aimed to one specific demographic
use. On the one hand, CEN EN 13606 demographic model is aimed to the transmission of an EHR
extract of the minimal part of demographic information. That information should allow the
retrieval of the full demographic information available on the system. On the other hand,
openEHR demographic model aims to model all the demographic information stored on the
system. Thus, both openEHR and CEN EN13606 demographic models can coexist on the systems.
As additional results of this paper, both CEN EN13606 and openEHR demographic XSD schema
have been developed. Furthermore, with the importation to LinkEHR of these XSD schemas,
LinkEHR is the first archetype editor to support the creation of openEHR and CEN EN13606
demographic archetypes.
As a future work, more specific archetypes should be created in order to define demographic
concepts available on real systems. Also, this work does not deal with the problem of generating
the unique identifiers of the demographic data in a unified demographic server. This will be
addressed on future works.
Acknowledgments
This work has been funded by project TSI2007-66575-C02-01 from Ministerio de Educación y
Ciencia, the Consellería d’Empresa, Universitat i Ciencia, reference APOSTD/2007/055 and the
Programa de Apoyo a la Investigación y Desarrollo (PAID-06-07) from the Universidad Politécnica
de Valencia.
References
[1] Primary care centers from the Spanish national health system catalog and hospitals national
catalog. http://www.msc.es/ciudadanos/prestaciones/centrosServiciosSNS/hospitales/home.htm
[2] CEN/TC251, EN13606-1: Health Informatics - Electronic Health Record communication, part 1.
[3] T. Beale, S. Heard, D. Kalra, D. Lloyd. The openEHR Reference Model, Demographic Information
Model. 2008.
http://www.openehr.org/releases/1.0.2/architecture/rm/demographic_im.pdf
[4] T. Beale. Archetypes, Constraint-based Domain Models for Future-proof Information Systems.
(2001) http://www.deepthought.com.au/it/archetypes/archetypes.pdf
[5] RDM. Dias, SM. Freire, Arquétipos para Representar as Informações Demográficas em Saúde.
XI Congresso Brasileiro de Informática em Saúde, 2008. v. 1. p. 1-6
[6] JA. Maldonado, D. Moner, D. Boscá, C. Angulo, M. Robles, JT. Fernández. Framework for clinical
data standardization based on archetypes. Stud. Health Technol. Inform., 454-8, (2007)
78
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
79
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
The breast cancer medical process is almost entirely achieved manually and there is an evident risk of
human errors due to the eventual lack of experience of the staff (substitutions, sick leaves and other
reasons). Another fact is that the correct fulfillment of process may depend on the personal attitude of
the administrative staff. In order to guarantee the security of the patient the whole process will be
automatically orchestrated, and monitored. The support for the solution will be a service oriented
architecture combined with semantic web techniques (archetypes and ontologies) to infer knowledge
from predefined rules to make the process secure.
Keywords: Electronic healthcare records, clinical archetypes, ontologies, OWL, Web Services, breast
cancer, prognostic factor.
1. Introduction
The breast cancer is an increasing disease that affects many women in our country. Every year
there are diagnosed 40 cancer cases per 100.000 queries, thus breast cancer is the most frequent
malignant tumor in female population.
The diagnostic of the breast cancer involves a great number of professionals in different assistance
areas: family doctors, gynecologists, radiologists, pathologists, oncologists, administrative staff…,
and with diverse diagnostic and treatment resources that increases the complexity of the way to
handle. It supposes an organizational challenge because many services of the health care service
are involved and many different people are interacting to succeed with the diagnostic and
therapeutic process.
The weakness in the chain is that currently the process can only be made manually and the whole
process that involves administrative work, appointments and so is managed by the doctors.
80
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
The different doctors that watch the patient during the assistance process can make decisions
individually that can change the treatment even though there is a written protocol to proceed in a
general way.
Due to previously exposed reasons there is a basic need that is to have a decision making tool that
communicates with the different IT systems and guaranties the reliability, control and monitoring
of the medical process. This tool will manage efficiently the resources independently who the
operator is and his attitude to work in order to minimize human errors.
The decision making backed in artificial intelligence mechanisms plays an important role in this
project because human errors due to lack of experience can be reduced and the addition of
automatic inferred knowledge will add relevant value to the research.
The use of ontologies is a key factor in this project because the knowledge of a doctor about a
diagnostic is difficult to transfer because the knowledge is based on personal experience and the
representation of this knowledge to explain to others is not homogeneous. The lack of a
homogeneous representation of the knowledge is a problem in order to share and compare
experiences and knowledge among professionals.
2. Development
Based on the research done by Matthew Hardy Williams11 (Integrating Ontologies and
Argumentation for decision-making in breast cancer) we produce our ontology for the data
obtained from Cruces Hospital.
11
http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/4410240/4410339/04410388.pdf?arnumber=4410388
12
http://www.breastcancer.org/symptoms/testing/types/
81
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Based on these features (characteristics) defines the different stages13 of the breast cancer.
Breast cancer
Metastasis
13
http://www.breastcancer.org/treatment/planning/cancer_stage/
14
http://www.openehr.org/publications/archetypes/archetypes_beale_web_2000.pdf
15
http://www.openehr.org
82
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
2.3. Ontology
The following example shows a practical approach of a part of the ontology. The definition of
MsJones is described in Figure 2:
For example: Let’s suppose the following significant data to identify the breast cancer. Ms Jones is
an aged 50 plus woman, postmenopausal, 53 years old who after done the breast cancer test has a
5 cm tumor in her breast. She has more than 9 lymphatic nodes infected and there is no
metastasis.
<Women rdf:ID="MsJones">
<hasMetastasis>
<Metastasis rdf:ID="Met_Negative">
<hasResult rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>negative</hasResult>
</Metastasis>
</hasMetastasis>
<hasAge rdf:datatype="http://www.w3.org/2001/XMLSchema#int"
>53</hasAge>
<rdf:type rdf:resource="#Aged50Plus"/>
<rdf:type rdf:resource="#Postmenopausal"/>
<hasTumor>
<Bigger5cm rdf:ID="Bigger5cm_1"/>
</hasTumor>
<hasLymphNodes rdf:resource="#More9Node_1"/>
</Women>
83
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
<rdf:Description rdf:about="http://acl/BMV#MsJones">
<j.0:hasAge rdf:datatype="http://www.w3.org/2001/XMLSchema#int">53</j.0:hasAge>
<j.0:hasLymphNodes rdf:resource="http://acl/BMV#More9Node_1"/>
<j.0:hasMetastasis rdf:resource="http://acl/BMV#Met_Negative"/>
<j.0:hasTumor rdf:resource="http://acl/BMV#Bigger5cm_1"/>
<rdf:type rdf:resource="http://acl/BMV#Postmenopausal"/>
<rdf:type rdf:resource="http://acl/BMV#Aged50Plus"/>
<rdf:type rdf:resource="http://acl/BMV#Women"/>
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
<rdf:type rdf:resource="http://acl/BMV#Adults"/>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>
<j.0:recommendedDrugTreatment>Tamoxifen 40mg during 2 years</j.0:recommendedDrugTreatment>
<j.0:lymphNodesRecommendedTreatment>radiation to supravicular and/or internal mamary lymph nodes
and removed auxiliary lymph nodes</j.0:lymphNodesRecommendedTreatment>
<j.0:breastRecommendedTreatment>modified radical mastectomy followed by radiation and lumpectomy
plus radiation following chemotherapy to shrink a large single
cancer</j.0:breastRecommendedTreatment>
<j.0:hasCancerStage>OperableIIIC</j.0:hasCancerStage>
<j.0:metastasis>no</j.0:metastasis>
<j.0:hasKindLymph>N3</j.0:hasKindLymph>
<j.0:hasKindTumor>T3</j.0:hasKindTumor>
</rdf:Description>
As we can see the current state of the cancer for the patient has been inferred as well as the
recommended treatment for the lymphatic nodes (radiation to supravicular and/or internal
mammary lymph nodes and removed auxiliary lymph nodes), the pills doses and the duration of
the treatment (Tamoxifen 40mg during 2 years).
We have implemented a prototype with this part of the functionality to make a demonstration to
non experts in semantic web following the W3C accessibility and usability recommendations.
The inferred knowledge is represented in a more user-friendly format using web development and
transformation techniques.
The prototype shows the inferred knowledge with the detail of the stage of the cancer, possible
treatment etc…
It is important to say that the presented solution will be used to help the doctor to make a
decision on the diagnostic or treatment. The final decision will always be a human decision.
84
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
3. Conclusions
- We developed a case study based on a breast cancer guideline, and in order to make this
feasible we have provided a simple prototype.
- We aimed to achieve the following:
- Model the results of clinical trials, and the background knowledge that provides the terms
used to describe the results of the trials.
- Model arguments for both belief and decisions.
- Take a piece of medical knowledge and represent knowledge at different levels of
abstraction.
- Represent the terms related to breast cancer in order to unify concepts.
- A rapid access to the updated information of the patient that will improve the diagnostic
and treatment.
- The prototype will help the doctors in the decision making for diagnostic and treatment.
References
1. Douglas K. Barry. The Object Database Handbook: How to Select, Implement, and Use
Object-Oriented Databases. John Wiley and Sons, 1th edition, 1996.
2. Anita Burgun, Olivier Bodenreider, Christian Jacquelinet Issues in the Classification of
Disease Instances with Ontologies, MIE 2006
3. Bibbo M. Comprehensive Cyropathology. W.B. Saunders Co. Philadelphia. 1997
4. J. Broekstra, A. Kampman, y F. van Harmelen. Sesame: A generic architecture for storing
and querying rdf and rdf schema, 2002.
5. Tim Berners-Lee, James Hendler, y Ora Lassila. The semantic web. Scientic American,
284(5):34{43, May 2001.
6. Clark D. P. Thyroid Cytopathology. ESSENTIALS IN CYTOPATHOLOGY. Foreword by Edmund
S. Cibas, M.D. Series Editor Dorothy L.Rosenthal. Springer 2005
7. Amarnath Gupta et alia: Towards a formalization of disease-specific ontologies for
neuroinformatics, 2003.
8. T. R. Gruber. A translation approach to portable ontology specifications. Knowledge
Acquisition, 6(2):199{221, 1993.
9. M. Hardy Williams. Integrating Ontologies and argumentation for decision-making in breast
cancer. Doctoral Thesis, University College London, 2008.
85
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Abstract
Terminology lexical alignment is a crucial task to enable inter-operability between health care
applications indexed by different but related controlled vocabularies. In this study, we propose a
method to automatically reconcile two biomedical terminologies. First, the method identifies similar
terms across terminologies using a lexical technique provided by the National Library of Medicine (NLM)
to perform a search in the UMLS Metathesaurus. Second, based on this term alignment, the method
recognizes similar concepts on the basis of concept-to-term relationships. Third, the method validates
the lexical alignment by checking the belonging of concepts to similar top-level categories across
thesauri.
Keywords: controlled vocabularies, the Unified Medical Language System (UMLS), terminology mapping.
1. Introduction
Biomedical terminologies offer shared vocabularies, so they are key to integrate health care
systems. The need of using them in many health care activities, as well as in information retrieval,
has caused they have increased in number. With this proliferation, diferent health care systems
use diferent biomedical termi nologies. Therefore, biomedical terminology alignment helps
establish agreement between diferent health care systems. In this context, the need for new tools
and techniques to reconcile diferent but related biomedical terminologies becomes crucial [3, 10,
11].
The use of lexical methods in terminology alignment produces high-quality mappings [3, 9, 10, 13].
However, the huge volume of data in biomedical termi nologies hinders the manual revision of
lexical mappings; a relevant human effort is needed to suitably interpret them and guarantee the
validity of the final lexical alignment [12]. Therefore, new methods to automatically evaluate the
lexical alignments are needed.
The work presented here exemplifies a way of facing up to a method that can establish
correspondences between diferent but related biomedical terminologies. In the present study, we
examine a lexical technique (named NormalizeString) provided by the UMLS Knowledge Source
Server (UMLSKS)16, one of the most important publicly available resources in the biomedical
domain. We propose a method to automatically detect invalid lexical mappings and so, en hance
the lexical alignment generated by this lexical technique. The method was applied to map a large-
scale biomedical thesaurus (EMTREE17) to the complete UMLS Metathesaurus [1, 6].
16
http://umlsks.nlm.nih.gov
17
http://www.info.embase.com/emtree/about/
86
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
2. Background
Clinical terminology systems group words and phrases symbolizing the knowledge within a
particular domain [2]. There are multiples ways of expressing knowledge in a terminology [7].
Simplifying, terminologies can represent knowledge by means of only terms (term-based view) or
by means of both concept and terms (concept-based view). In this second case, terms are only
labels (words) for concepts, which are the main entities encapsulating meaning.
On the other hand, terminology mapping is the task of identifying correspondences between
entities (terms or concepts) across two terminologies. Discovering these matches is intrinsically
problematic to automate. Currently, we can find two approaches to get interoperability among
terminologies: merging and aligning source thesauri. In the first approach, a single coherent
version is created from merging original sources. This approach was followed in the UMLS [6] to
develop a large meta-thesaurus that reconciles diferences in terminology from over 130
biomedical information sources. However, this solution is too expensive with large sources, so
often the most feasible solution is to keep the original sources separately and to add
correspondences among sources. This second option is, for example, the recommended one for
developing multilingual thesauri[5] and the followed one in many researches in the biomedical
domain [4, 8, 10,13].
3. Methods
3.2. Overview
The following sections present, in detail, the three steps included in our method: lexical alignment
of terms, lexical alignment of concepts and validation of lexical alignment. Our method was
programmed in Java, running on a Personal Computer over Linux, in some occasions, and
Microsoft Windows XP, in other occasions, and it used an XML representation for both EMTREE
and the Metathesaurus.
87
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
simple mappings (thorax Thorax and rib cage Rib cage), and one ambiguous mapping: chest
maps both Chest and CHEST.
Fig. 1: Example data of term anchors between EMTREE and the UMLS.
88
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
The two remaining are identified as invalid, those that are in dissimilar categories (Chest is
an Anatomical concept in EMTREE whereas Malignant neoplasm of thorax and Chest
problem are Disorders in the UMLS):
thorax-Malignant neoplasm of thorax
thorax-Chest problem
4. Results
In total, forty-one thousand five hundred and thirty-four concepts in EMTREE matched one or
more concepts in the UMLS Metathesaurus, by lexical alignment. In addition to the UMLS
Metathesaurus, our experiment involved a source terminology, EMTREE, with a good collection of
synonyms: 4.18 per concept. Our method exploited the fact that the knowledge in EMTREE is
organized by concepts, in the way of concept-to-term relationships. As a consequence of taking all
EMTREE synonyms into account, the coverage increased from 66.5% (for EMTREE terms) up to
80% (for EMTREE concepts). This confirms that if the terminologies to be mapped include a large
set of synonyms, then the coverage of lexical alignment increases. However, the higher the
number of synonyms, the higher the number of ambiguous mappings. In our experiment, on
average 2.1 mappings per EMTREE concept were found when we only mapped EMTREE preferred
terms; but this number increased up to 3.2 mappings per EMTREE concept when we mapped all
EMTREE synonyms.
On the other hand, 6 similar top-level categories across the two terminologies were identified.
Although the number of categories having similarity is small (6from 15 EMTREE top-level
categories), this corresponds to a substantial number of EMTREE concepts: 65.3% of the total
EMTREE concepts and so, 75.8% of the total lexical mappings found by NormalizeString.
Finally, by global checking, our method detected 6,927 (7.9%) invalid term anchors. This led to 410
invalid concepts (1.2%) of the total concept anchors.
5. Conclusions
The contribution of our work, rather than producing new alignment methods or tools, is to apply
the existing lexical techniques to map large-scale terminologies and to provide a method to
improve lexical techniques by detecting invalid lexical mappings.
A comparison of our method against past terminology mapping research is highlighted by some
key features. First, our method is fexible enough to validate lexical mappings in terminology
alignment through the UMLS. The only prerequisite is that one of the two terminologies has to be
integrated into the UMLS Metathesaurus. Although our experiment has validated the complete
UMLS Metathesaurus, our approach can be used to create inter-terminology mappings using the
89
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
UMLS, as an external resource. In order to do this, the lexical technique must be restricted to the
terminology target. In addition, although our method has been tested using the NormalizeString, it
can also be applied selecting other techniques provided by the UMLS, such as ExactMatch,
NormalizeWord, ApproxMatch, etc. Second, our system is fully automated, requiring no manual
input or rule definitions. Third, our method has been tested and evaluated with a large-scale real-
world vocabulary against the UMLS.
Acknowledgements
This work has been funded by the Secretaría General de Política Científica y Tecnológica del
Ministerio de Educación y Ciencia, through the research project TIN2006-15453-C04-02.
References
1. Bodenreider, O. The Unified Medical Language System(UMLS): integrating biomedical
terminology. Nucleic Acids Research, 32 (2004), Database issue D267-D270.
2. Cimino, J.J. Desiderata for Controlled Medical vocabularies in the Twenty-First Century.
Methods Inf. Med., 37 (4-5) (1998), 394-403.
3. Doerr, M. Semantic problems of thesaurus mapping. Journal of Digital Information, 1 (8)
(2001).
4. Fung KW, Bodenreider O, Aronson AR, Hole WT and Srinivasan S. Combining lexical and
semantic methods of inter-terminology mapping using the UMLS. StudHealth Technol
Inform 129 (2007), 605-609.
5. ISO 5964, 1985. Guidelines for the establishment and development of multilingual
thesauri. International Organization for Standarization.
6. Lindberg, D., Humphreys, B. and Mc Cray, A. The Unified Medical LanguageSystem.
Methods of Information in Medicine, 32 (1993), 281-291.
7. Rosenbloom, S.T., Miller, R.A., Johnson, K.B., Elkin, P.L., Brown, S.H. A Model for Evaluating
Interface Terminologies. JAMIA, 15 (1) (2008), 65-76.
8. Sarkar, I.N., Cantor, M.N., Gelman, R., Hartel, F. and Lussier, Y.A. Linking biomedical
language Information and knowledge resources in the 21st Century: GO and UMLS. Pacific
Symposium on Biocomputing, 8 (2003), 439-450.
9. Sun, J. Y., Sun, Y. A system for automated lexical mapping. JAMIA, 13 (3) (2006), 334-343.
10. Vizine-Goetz, D., Hickey, C., Houghton, A. and Thompson, R. Vocabulary Mapping for
Terminology Services. Journal of Digital Information, 4(4) (2004).
11. Yu, A.C. Methods in biomedical ontology. Journal of Biomedical ontology, 30(3)(2006), 252-
266.
12. Zeng, M.L. and Chang, L.M. Trends and issues in establishing interoperability among
knowledge organization systems. Journal of the American Society for Information Science
and Technology, 55 (5) (2004), 377-395.
13. Zhang, S., Mork, P., Bodenreider, O. and Berstein, P.A. Comparing two approaches for
aligning representations of anatomy. Artificial Intelligence in Medicine 39 (2007), 227-236.
90
Proc. of the First Symposium on Healthcare Systems Interoperability (Alcalá de Henares, April 2009)
Annex I. Committees
Technical committee
Organization commitee
91