Machine Learning in Bio-Signal Analysis and Diagnostic Imaging
By Nilanjan Dey and Amira S. Ashour
()
About this ebook
Machine Learning in Bio-Signal Analysis and Diagnostic Imaging presents original research on the advanced analysis and classification techniques of biomedical signals and images that cover both supervised and unsupervised machine learning models, standards, algorithms, and their applications, along with the difficulties and challenges faced by healthcare professionals in analyzing biomedical signals and diagnostic images. These intelligent recommender systems are designed based on machine learning, soft computing, computer vision, artificial intelligence and data mining techniques. Classification and clustering techniques, such as PCA, SVM, techniques, Naive Bayes, Neural Network, Decision trees, and Association Rule Mining are among the approaches presented.
The design of high accuracy decision support systems assists and eases the job of healthcare practitioners and suits a variety of applications. Integrating Machine Learning (ML) technology with human visual psychometrics helps to meet the demands of radiologists in improving the efficiency and quality of diagnosis in dealing with unique and complex diseases in real time by reducing human errors and allowing fast and rigorous analysis. The book's target audience includes professors and students in biomedical engineering and medical schools, researchers and engineers.
- Examines a variety of machine learning techniques applied to bio-signal analysis and diagnostic imaging
- Discusses various methods of using intelligent systems based on machine learning, soft computing, computer vision, artificial intelligence and data mining
- Covers the most recent research on machine learning in imaging analysis and includes applications to a number of domains
Read more from Nilanjan Dey
Social Network Analytics: Computational Research Methods and Techniques Rating: 0 out of 5 stars0 ratingsSoft Computing Based Medical Image Analysis Rating: 0 out of 5 stars0 ratingsMagnetic Resonance Imaging: Recording, Reconstruction and Assessment Rating: 5 out of 5 stars5/5Biomedical Sensors and Smart Sensing: A Beginner's Guide Rating: 0 out of 5 stars0 ratingsA Beginner's Guide to Data Agglomeration and Intelligent Sensing Rating: 0 out of 5 stars0 ratingsApplied Speech Processing: Algorithms and Case Studies Rating: 0 out of 5 stars0 ratings
Related to Machine Learning in Bio-Signal Analysis and Diagnostic Imaging
Related ebooks
Computational Modeling in Bioengineering and Bioinformatics Rating: 0 out of 5 stars0 ratingsDeep Learning for Data Analytics: Foundations, Biomedical Applications, and Challenges Rating: 0 out of 5 stars0 ratingsBiomedical Information Technology Rating: 0 out of 5 stars0 ratingsIntelligent IoT Systems in Personalized Health Care Rating: 0 out of 5 stars0 ratingsCognitive and Soft Computing Techniques for the Analysis of Healthcare Data Rating: 0 out of 5 stars0 ratingsDeep Learning Techniques for Biomedical and Health Informatics Rating: 0 out of 5 stars0 ratingsWearable Telemedicine Technology for the Healthcare Industry: Product Design and Development Rating: 0 out of 5 stars0 ratingsDeep Learning for Medical Applications with Unique Data Rating: 0 out of 5 stars0 ratingsCognitive Big Data Intelligence with a Metaheuristic Approach Rating: 0 out of 5 stars0 ratingsComputational Intelligence and Its Applications in Healthcare Rating: 0 out of 5 stars0 ratingsElectronic Devices, Circuits, and Systems for Biomedical Applications: Challenges and Intelligent Approach Rating: 0 out of 5 stars0 ratingsControl Applications for Biomedical Engineering Systems Rating: 0 out of 5 stars0 ratingsInternet of Things in Biomedical Engineering Rating: 4 out of 5 stars4/5Deep Learning for Chest Radiographs: Computer-Aided Classification Rating: 0 out of 5 stars0 ratingsTrends in Deep Learning Methodologies: Algorithms, Applications, and Systems Rating: 0 out of 5 stars0 ratingsData Analytics in Biomedical Engineering and Healthcare Rating: 0 out of 5 stars0 ratingsComputational Immunology: Models and Tools Rating: 0 out of 5 stars0 ratingsApplications of Big Data in Healthcare: Theory and Practice Rating: 0 out of 5 stars0 ratingsDiagnostic Biomedical Signal and Image Processing Applications With Deep Learning Methods Rating: 0 out of 5 stars0 ratingsDemystifying Big Data, Machine Learning, and Deep Learning for Healthcare Analytics Rating: 0 out of 5 stars0 ratingsMachine Learning, Big Data, and IoT for Medical Informatics Rating: 0 out of 5 stars0 ratingsAdvances in Computational Techniques for Biomedical Image Analysis: Methods and Applications Rating: 0 out of 5 stars0 ratings3D Bioprinting and Nanotechnology in Tissue Engineering and Regenerative Medicine Rating: 0 out of 5 stars0 ratingsInternet of Multimedia Things (IoMT): Techniques and Applications Rating: 0 out of 5 stars0 ratingsHandbook of Deep Learning in Biomedical Engineering: Techniques and Applications Rating: 0 out of 5 stars0 ratingsArtificial Intelligence in Healthcare Rating: 0 out of 5 stars0 ratingsIoT-Based Data Analytics for the Healthcare Industry: Techniques and Applications Rating: 0 out of 5 stars0 ratingsLeveraging Biomedical and Healthcare Data: Semantics, Analytics and Knowledge Rating: 0 out of 5 stars0 ratingsArtificial Intelligence, Machine Learning, and Mental Health in Pandemics: A Computational Approach Rating: 0 out of 5 stars0 ratingsControl Theory in Biomedical Engineering: Applications in Physiology and Medical Robotics Rating: 0 out of 5 stars0 ratings
Technology & Engineering For You
The Art of War Rating: 4 out of 5 stars4/5Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career Rating: 4 out of 5 stars4/5The Homeowner's DIY Guide to Electrical Wiring Rating: 5 out of 5 stars5/5The 48 Laws of Power in Practice: The 3 Most Powerful Laws & The 4 Indispensable Power Principles Rating: 5 out of 5 stars5/580/20 Principle: The Secret to Working Less and Making More Rating: 5 out of 5 stars5/5How to Disappear and Live Off the Grid: A CIA Insider's Guide Rating: 0 out of 5 stars0 ratingsThe Big Book of Maker Skills: Tools & Techniques for Building Great Tech Projects Rating: 4 out of 5 stars4/5The Big Book of Hacks: 264 Amazing DIY Tech Projects Rating: 4 out of 5 stars4/5The CIA Lockpicking Manual Rating: 5 out of 5 stars5/5Electrical Engineering 101: Everything You Should Have Learned in School...but Probably Didn't Rating: 5 out of 5 stars5/5Logic Pro X For Dummies Rating: 0 out of 5 stars0 ratingsThe Total Inventor's Manual: Transform Your Idea into a Top-Selling Product Rating: 1 out of 5 stars1/5Motorcycling For Dummies Rating: 4 out of 5 stars4/5Broken Money: Why Our Financial System is Failing Us and How We Can Make it Better Rating: 5 out of 5 stars5/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratingsThe Systems Thinker: Essential Thinking Skills For Solving Problems, Managing Chaos, Rating: 4 out of 5 stars4/5U.S. Marine Close Combat Fighting Handbook Rating: 4 out of 5 stars4/5Understanding Media: The Extensions of Man Rating: 4 out of 5 stars4/5My Inventions: The Autobiography of Nikola Tesla Rating: 4 out of 5 stars4/5A Night to Remember: The Sinking of the Titanic Rating: 4 out of 5 stars4/5Smart Phone Dumb Phone: Free Yourself from Digital Addiction Rating: 0 out of 5 stars0 ratingsThe Art of War Rating: 4 out of 5 stars4/5Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time Rating: 4 out of 5 stars4/5The Fast Track to Your Technician Class Ham Radio License: For Exams July 1, 2022 - June 30, 2026 Rating: 5 out of 5 stars5/5
Related categories
Reviews for Machine Learning in Bio-Signal Analysis and Diagnostic Imaging
0 ratings0 reviews
Book preview
Machine Learning in Bio-Signal Analysis and Diagnostic Imaging - Nilanjan Dey
Brazil
Preface
Nilanjan Dey⁎, ⁎ Techno India College of Technology, Kolkata, India
Surekha Borra†, † KS Institute of Technology, Bangalore, India
Amira S. Ashour‡, ‡ Tanta University, Tanta, Egypt
Fuqian Shi§, § Wenzhou Medical University, Wenzhou, People’s Republic of China
Innovations such as telemedicine and recommender systems employ intelligent systems which can automatically predict the diseases and suggest recommendations to the patients based on electronic patient records (EPR) such as ECG signals, MRI, CT-scan, X-ray, ultrasound and PET images. Recent advances in data mining have led to the development of automatic analysis tools which helps in early detection and accurate prediction of patient diseases such as breast cancer, lung cancer, diabetics, heart diseases, acute kidney injury, and so on. These intelligent recommender systems are designed based on machine learning, soft computing, computer vision, artificial intelligence, and data mining techniques. Classification and clustering techniques such as PCA, SVM, Naive Bayes, Neural Network, Decision trees, Association Rule Mining, Random forests, Convolutional Neural Networks, and so on, are few among different kinds of approaches.
The design of high accuracy decision support systems assists and eases the healthcare practitioners and suits variety of applications in the Health sector. Integrating the machine learning (ML) technology with the human visual psychometrics helps meeting the demands of the radiologists in improving the efficiency and quality of diagnosis in dealing with unique and complex diseases at real time, by reducing the human errors, and allowing fast and rigorous analysis.
This book on Machine Learning in Bio-Signal Analysis and Diagnostic Imaging presents original and valuable research work on advanced analysis and classification techniques of biomedical signals and images, which covers the introduction, design, and optimization of techniques in both supervised and unsupervised machine learning models, standards, algorithms, applications, difficulties and challenges faced by healthcare professionals in analyzing the biomedical signals and diagnostic images.
Chapter 1 presents an ontology-based medical report mapping process (OMRMP) to represent the contents of 3654 unstructured upper gastrointestinal endoscopy reports written in Brazilian Portuguese into a database format. This is achieved by feeding OMRMP with reports containing information from bio-signal, images and videos collected during medical procedures to machine learning models. A satisfactory mapping performance is reported after comparing the results with previous ones in smaller and simpler sets of reports and also on sets with different sizes.
Chapter 2 presents a computer-aided diagnoses system for detecting multiple ocular diseases using color retinal fundus images using the concept of problem transformation method, multiple class classifiers, and multilevel support vector machine (MLSVM) acquisition of images, preprocessing, enhancement, segmentation of the regions that will be extracted, feature extraction and selection, classification, and evaluation. Results reported 96.1% accuracy with mild training time when applied on DIARET dataset.
Chapter 3 presents a DEFS based system for the differential diagnosis of severe fatty liver and cirrhotic liver to investigate the potential of conventional gray scale B-mode ultrasound imaging modality. Results indicated that the optimal subsets of FOS + Laws’ features set yielded by kNN-DEFS wrapper based feature selection algorithm outperformed with an AA (SD) value of 99.5 (0.8). From the exhaustive experiments carried out in the present work, it can be concluded that selection of a suitable classifier in a wrapper based feature selection algorithms can enhance the performance of the CAD system design.
Chapter 4 presents a comprehensive review of the soft computing techniques and the theory behind infrared thermography applied to medical image analysis, the focus being assessment of diabetic foot complications. A hybrid soft computing paradigm using fuzzy logic and artificial neural networks with a deep learning architecture is also discussed. The issues and challenges to be addressed in using infrared thermography for diagnostic purposes and the hardware/software/technology considerations in perspective are also presented.
In Chapter 5 the heart rate variability (HRV) of normal (NOR) subjects, HTN and CAD patients has been compared using linear and nonlinear features with different classifiers considering 5 min recordings of electrocardiogram (ECG) for processing of consecutive heartbeat (RR) interval tachogram. The results obtained using the proposed methodology indicate that this computer aided classification system can be used as an additional diagnostic tool to effectively differentiate between the normal subjects and HTN and CAD affected patients.
Chapter 6 presents exhaustive experimentations performed for the selection of optimum ROI size for development of a computer assisted framework for breast tissue pattern characterization using digitized screen film mammograms. The results of the study demonstrate that the ROI size of 128 × 128 pixels manually extracted from the core region of the breast provide significant information to discrimination between various breast tissue pattern classes. It may be also noticed that higher accuracy of 91.2% is achieved by 2-class breast density classification module in comparison to 79.5% as achieved by 4-class breast density classification framework.
Chapter 7 reviews the ANN’s architecture from perceptron model to updated FNN model along with its components: learning, error calculation and weight updation, focusing more on optimizing FNN architecture with nature-inspired algorithms (NIAs). The study found that the architectures and weights are optimized in parallel, and FNNs considered for optimization was mostly of single hidden layer. The study also suggested that NIAs (e.g., GA, PSO, ABC, and all) cannot update each and every component of FNN by a single method and hence requires developing some hybrid strategies for constructing optimized FNN.
Chapter 8 presents a comparative study of various ensemble learning techniques such as bagging and boosting (adaptive & logistic boosting) and infers the suitable composition for motor-imagery EEG signal classification. The chapter concludes that wavelet based Engent is reliable feature extraction technique under the Exp-IV configurations. The Type-III architecture of experiment IV made of KNN base classifier with K = 7 is the best performing ensemble composition. The multiple types of base classifiers bring higher diversity in terms of decision boundary than diversity drawn from variation in the hyper-parameters for a single type of base classifier.
Chapter 9 introduces significant topics of the multilabel classification in medical image analysis. A detailed analysis and discussions of the literature findings are presented. The performance of the methods is compared on five publicly available data sets such as Yeast, Scene, genebase, corel5k, and BibTex of multilabel classification using famous measures. Further, a computer-aided CAD system framework for the existing multilabel classification research is presented.
Chapter 10 presents an overview of techniques, tools, and challenges involved in multimodal figure search in biomedical literature to support doctors and scientists. A survey of techniques and methods involved in building a search engine, figure extraction, figure classification, figure indexing, search engines, web applications are presented.
Chapter 11 reviews the methods and challenges of applying various machine learning (ML) algorithms for image classification and security.
Chapter 12 reviewed the current state and overview of internet of medical robotic things IoMT related services and technologies in healthcare and monitoring. Also presented is the overview of robotics in the health care, health services, medical emergencies, robotic surgery, long term benefits of human being using robotics with IoMT and the challenges. The need for designing and developing the hardware and software modules for better transmission of large and critical data with the advanced communication network is highlighted.
This book is useful to researchers, practitioners, manufacturers, professionals, and engineers in the field of bio medical systems engineering and may be referred by students for advanced material.
We would like to express gratitude to the authors for their contributions. Our gratitude is extended to the reviewers for their diligence in reviewing the chapters. Special thanks to our publisher, Elsevier Ltd.
As editors, we wish this book will stimulate further research in developing algorithms and optimization approaches related to Machine Learning in Bio-Signal Analysis and Diagnostic Imaging.
Editors
Chapter 1
Ontology-Based Process for Unstructured Medical Report Mapping
Jefferson Tales Oliva⁎,†; Huei Diana Lee†; Newton Spolaôr†; Claudio Saddy Rodrigues Coy‡; João José Fagundes‡; Maria de Lourdes Setsuko Ayrizono‡; Feng Chung Wu†,‡ ⁎ Bioinspired Computing Laboratory, Graduate Program in Computer Science and Computational Mathematics, University of São Paulo, São Carlos, Brazil
† Laboratory of Bioinformatics, Graduate Program in Electrical Engineering and Computer Science, Western Paraná State University, Foz do Iguaçu, Brazil
‡ Service of Coloproctology, Graduate Program in Medical Sciences, University of Campinas, Campinas, Brazil
Abstract
Hospitals and clinics store an increasing amount of clinical data, such as medical reports. These reports often describe, in natural language, findings from bio-signals, images, and videos collected during a medical procedure. Data mining can explore reports data to find patterns useful to assist experts’ decision making processes and medical procedure development. However, the content of medical reports is rarely organized into an appropriate format. To tackle this issue, we developed the ontology-based Medical Report Mapping Process to represent the content of unstructured reports into a database format. This chapter applied the ontology-based process to map 3654 unstructured upper gastrointestinal endoscopy reports written in Brazilian Portuguese. As a result, a satisfactory mapping performance was achieved. By comparing this result with previous ones in smaller and simpler sets of reports, this chapter suggests that the ontology-based process performs well in sets with different sizes.
Keywords
Text mining; Natural language processing; Data mining; Ontologies; Medical reports
1 Introduction
The human digestive system consists of several anatomical portions in which different abnormalities can occur [1, 2]. In particular, upper alimentary tract like esophagus, stomach, and duodenum, is susceptible to some common diseases and conditions, such as cancer. In Brazil, for example, 7600 and 12,920 new cases of stomach cancer were estimated for the year 2016 in women and men, respectively, and an early and accurate diagnosis becomes imperative in the control and treatment of these diseases [3].
Therefore, upper gastrointestinal endoscopy (UGIE) is an important tool for diagnosis and treatment of lesions in these anatomical regions. It allows experts to capture video and images from the alimentary tract during the UGIE examination. After this medical procedure, the experts usually create textual reports to record findings and information regarding the examination, supplementing the acquired media [4]. In the end, all the medical data regarding UGIE is stored in computers and other equipments and this data is used, for example, to diagnose gastrointestinal disorders.
Data storage capacity is increasing, however, the higher the amount of stored data, the harder the analysis of the data is. This is also the case for medical data in many hospitals and clinics. Human analysis of this data requires valuable time from experts and is susceptible to subjectivity. In this scenario, computational methods have been applied to support analysis of medical video, image, bio-signal, and text [5–9].
By focusing on text, one can note that this media type is essential, for example, in medical reports and records regarding UGIE or other medical procedures. To analyze this data with the assistance of computers, text mining methods are a relevant alternative. The idea is to identify, retrieve and analyze patterns in unstructured text written in natural language [10]. As a result, medical experts can obtain relevant content from a large amount of reports and textual records.
Although text mining methods are powerful tools to find data patterns, they are susceptible to different textual issues, such as mistypings, synonyms variability, irrelevant words and missing information [11]. In this scenario, we developed, in collaboration with medical and computer experts, the ontology-based medical report mapping process (OMRMP) method and its implementation as a computational tool [12, 13]. This proposal stands out due to its ability to transform content from unstructured medical reports into a format similar to a database (DB) table. To do so, a domain specific ontology is considered as an alternative to represent mapping rules that transform sets of relevant terms into values for database attributes. In particular, the ontology—a structure with classes, instances and relations [14]—integrates experts’ knowledge and links reports words with database table values. OMRMP yields an attribute value table that is useful as an input for experts’ studies, analyses, and decision making processes. The table is also compatible with data mining, machine learning, and other computational intelligence approaches that extract knowledge from structured data [15]. Another difference from conventional text mining methods is that OMRMP ontology and auxiliary structures deal with textual issues usual in reports from medical domain.
This chapter aims to apply the OMRMP method and tool to map a large set of 3654 artificial reports with valid terms usual in real UGIE medical reports. As a result, unstructured information from these reports, written in natural language, was successfully structured. Moreover, satisfactory results were achieved in terms of reduction of phrases and words, and percentage of report terms mapped. These achievements are competitive with the ones from previous work with smaller sets of UGIE reports, with less textual patters to map.
This chapter is organized as follows: Section 2 presents some pieces of related chapter. Section 3 describes the OMRMP method and its computational implementation. Section 4 details the experimental setup conducted in this chapter and Section 5 reports and discusses the results obtained by applying OMRMP in 3654 textual reports. Section 6 concludes this chapter with final highlights.
2 Related Work
Some concepts used in related work are named-entity recognition (NER) and the unified Medical language system (UMLS) [16], which can be computed in applications related to natural language processing (NLP) and text mining. The former concept is an alternative to extract meaningful terms, such as the names of diseases and gens, from unstructured biomedical text [17, 18]. UMLS, in turn, consists in a collection of ontologies and vocabularies from distinct domains that can be used to support text processing methods [19].
Lee et al. [20] develops a method that illustrates NER application to process a corpus composed of MEDLINE abstracts [21]. After finding terms that delimit named entities, the proposal categorizes them into semantic classes described in an ontology. An important component of this method, a machine learning algorithm named Support Vector Machines [15], is used to identify entities boundaries and to classify the entities found.
Another piece of related work performs two additional steps before NER application [22]. In particular, the first step uses natural language processing techniques to split pieces of text into sentences before tagging its words with appropriate parts of speech. Nouns are then submitted to the second step to create groups of words representing noun phrases. Afterwards, NER matches the obtained phrases with concepts of an ontology to yield named entities.
NER and UMLS were combined in Khordad et al. [23]. In particular, a system was proposed to identify a type of named entity—phenotype names—and related information in biomedical text. To this end, a computational tool titled MetaMap [24] is used to associate input text with components from a phenotype ontology and the UMLS metathesaurus—a large dictionary linking synonyms from different vocabularies.
The SemPathFinder system was proposed in Song et al. [25] to discover relations considered unknown in biomedical texts. For this, text mining methods and UMLS are used to extract, in each text sentence, entities, and their relations.
Scuba et al. [26] presented the web-based tool called Knowledge Author, which aims to facilitate the representation of domain content in an ontology for information extraction in clinical texts. This tool supports the search for terms in the UMLS Metathesaurus database. The Knowledge Author was applied in 34 clinical free-text radiology reports to extract concepts related to carotid stenosis.
A more recent piece of work applies recurrent neural networks, a deep learning representative, to recognize named entities from text [27]. By doing so, the dependence from an appropriate feature set to feed machine learning algorithms is reduced. Experimental evaluations in two corpora showed competitive results.
Becker et al. [28] develops a web system that collaborates to transform concepts from free-text case definitions into codes inherent to UMLS. To ameliorate the complexity of dealing with free-text processing, the user can revise the extracted concepts, adding, removing, or expanding them to related concepts.
A recommendation system for biomedical ontologies (Ontology Recommender 2.0) was developed in the National Center for Biomedical Ontology [29]. This system process a text corpus or a keyword list to suggest appropriate ontologies for terms presented in the processed text.
In other work [30], a system, called Casama (Contextualized Semantic Maps) [31], was used for representation and summarization of biomedical literature, was applied in a collection of articles about lung cancer to summarize experimental conditions (study design and outcome measures) and patient/population level (properties related to study of the population considered in the study). To do so, the Casama uses semantic maps, which are composed of the set of relations that describe main findings related to clinical studies.
Besides NER and UMLS, other ideas can be used by computational approaches to extract information from unstructured biomedical text. An example from the literature is based on a domain-specific knowledge dictionary [32]. Such a dictionary contains rules that make possible the mapping of medical reports content into structured databases. Despite of its ability for biomedical text processing, this approach uses a dictionary with little flexibility to represent domain knowledge. To allow users to represent more sophisticated and complete rules, as well as to explore object-oriented programming concepts to link reports terms and database attribute values, the dictionary was replaced with an ontology in OMRMP [12, 13]. The proposed method also included support for more text mining procedures, such as lemmatization.
In what follows, OMRMP and the associated computational tool are described, due to their relevance in the experimental evaluation conducted in this chapter.
3 Ontology-Based Medical Report Mapping Process
The OMRMP is applied into two phases. In the first phase, text-processing techniques are applied in textual reports to find relevant patterns, which are used to build structures required to standardize reports and map their content into a DB, whose tasks are performed in second OMRMP phase [33].
3.1 First OMRMP Phase
The following methods are applied in the first OMRMP phase (Fig. 1):
•Unique phrase identification: the content of all reports are concatenated into a textual file. Subsequently, in this file, the phrases are alphabetically sorted and the repeated sentences are removed. Thus, one entry for each phrase is kept, resulting a structure called unique phrase set (UPS) [12]. This approach allows to view all different phrases of report sets into a simple structure, making less difficult the identification of relevant patterns. In the UPS, other text processing methods are applied to reduce the phrase variability and make it more friendly to build structures required for application of the next OMRMP phase. It is important to emphasize that, after the application of each text processing technique in a UPS, the unique phrase identification is reapplied in this structure.
•UPS normalization: the UPS1 (built by the previous method) content is normalized to replace uppercase and/or accented characters by equivalent lowercase and nonaccented characters, generating the UPS2 [34]. This method is useful mainly for text written in languages (e.g., Portuguese) that use accentuation in words. Also, reports can contain sentences that are considered different by computational processes only because some character is uppercase. For example, the character a
has variations such as A,
á,
Á,
à,
À,
ã,
Ã,
â,
and Â.
•Stopword removal: the previous UPS is used to identify terms that are considered irrelevant (stopwords) [35]. Disposable words can be prepositions, adverbs, special characters, and other particular words. Each irrelevant term composes a list named stoplist, which is used by the stopword removal technique to generate the UPS3 and preprocess textual reports. Likewise, the stoplist is represented in the XML (extensible markup language) language. Fig. 2 presents a stoplist example represented in XML, where the attribute number
is the amount of terms considered irrelevant added to the stopword list.
Fig. 2 Stoplist structure example.
•Lemmatization application: each word is morphologically reduced to its canonical format (lemma) [36]. For example, verbs, plural nouns, and female terms are transformed into their infinitive, singular, and male counterpart. This method generates the UPS4.
•Standardization application: a standardization file (SF) is built, together with domain experts, to standardize textual reports in OMRMP Phase 2. In this file, standardization rules (SR) are defined to replace synonyms by a simple word or phrase, keeping the same meaning [37]. After the SF building, this structure is applied to generate the CFU5. Also, the SF can expanded with more SR in other experiments, when the complexity increases when more or other reports, from the same domain, are used in the OMRMP process [34]. Like the stoplist, SF is represented in XML. Fig. 3 presents a SF structure example, where number
attribute is the amount of SR, synonym
tag represents a SR, old
tag describes a sentence/term to be replaced, new
tag corresponds to a new sentence/term which will replace the one represented by new,
and "n attribute is the number of new sentences/terms that will replace at once the content presented in the
old" tag.
Fig. 3 SF structure example.
Fig. 1 Schematic representation of the first OMRMP phase. Modified from J.T. Oliva, Automation of the Medical Report Mapping Process for a Structured Representation (Masters dissertation), State University of West Paraná, Foz do Iguaçu, Brazil (in Portuguese).
After the application of previous text processing techniques to generate UPSs, the last structure (e.g., UPS 5) is analyzed, together with domain experts, to build an ontology, which is used to generate a BD and map relevant patterns into report content to this database. To do so, attributes and Mapping Rules (MR) are defined to represent relevant information that can be found in medical reports [33]. In the ontology, each MR is associated to an attribute, whose combinations determine how the database is filled [38].
The MR represents a phrase, whose content may be in one of the following formats: location characteristic
or location characteristic subcharacteristic.
Location is a term which describes a body human part. Characteristic is an information regarding to the location condition, such as an abnormality or other relevant information. Subcharacteristics complement particular characteristics (e.g., measure of an injury). For example, the phrase middle esophagus with 5mm ulcer
contains the following MR components: location (middle esophagus
), characteristic (ulcer
), and subcharacteristic (5 mm
) [39].
The OMRMP ontology is structured in the ontology web language (OWL), which is used for knowledge representation by definition of taxonomies and classification networks [40]. This language was used due to its resources that increases flexibility for ontology development, expanding the potential for representation of relevant information in these structures [39]. The classes considered in our ontology are the following:
•Thing: the main class in OWL language. All components of an OWL ontology is connected to this class.
•Attribute: this class contains information about attributes that compose the DB. Attribute class is composed by two subclasses, which contain relevant information for each attribute and MRs:
–Attribute name: label assigned to the attribute.
–Attribute type: possible values for each characteristic or subcharacteristic of a MR.
•Term: possible vocables that can be found into phrases of medical reports. This class represents MRs. The following subclasses describe terms that composes MR:
–Region: terms that describe body human part (local).
–Observations: terms which describes the local condition. The following subclasses describe other two MR components:
⁎Characteristic.
⁎Subcharacteristic.
Fig. 4 shows an ontology structure example considered in the OMRMP. In this figure, rectangles represent classes and lines outline hierarchical relationship among these classes.
Fig. 4 Ontology structure example.
It should be emphasized that once the first OMRMP phase is applied to textual reports of a particular domain, the stoplist, SF, and ontology can be reused by this process in other experimental evaluations in the same