Вы находитесь на странице: 1из 6

International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No.

7 ISSN: 1837-7823

A Proposed Approach for Structuring Radiology Reports


Mehdi Ahmadi1 and Reza Rafeh2 1,2 Department of Computer Engineering, Faculty of Engineering 1 Islamic Azad University, Malayer Branch 2 Arak University, Arak 38156-8- 8349, Iran r-rafeh@araku.ac.ir Abstract
Medical database includes both structured and unstructured data. Textual radiology reports are usually stored in databases unstructured. In this article we propose a method for structuring textual radiology reports. We extract patterns based on concepts in the textual radiology and use the Naive Bayes classifier to structure the textual radiology reports with the patterns. The experimental results show that the proposed approach can structre the radiology reports with a high precision. Keywords: Radiology Report, Structured Report, Naive Bayes Classifier.

1. Introduction
Although imaging technologies have undergone dramatic evolution over the past century, radiology reporting has remained largely static, in both content and structure. Existing free-text reports have been criticized for a number of inherent deficiencies, including inconsistencies in content, structure, organization, and nomenclature [1]. Radiology reports are a form of clinical text that contains rich and useful information that is difficult to extract by automated means [2]. These radiology reports are important clinical elements of the patients healthcare records and assist in healthcare decisions [3]. The rapid growth of digitalized medical records presents new opportunities for mining terra bytes of data that may provide new information & knowledge. The knowledge discovered as such could assist medical practitioners in a myriad of ways [4]. Most reports are in free text forma, the valuable information contained in those reports cannot be easily accessed and used, unless proper text mining has been applied [5]. The structuring of the free text reports bridges the gap between users and report database, makes the information contained in the reports readily accessible [6]. In [7] 160000 radiology reports were analyzed to design a number of templates. Based on the designed templates, data entry is done structured on the computer system. A tool is provided for structured data entry in [8] which is a system for structured data entry and reporting that generates report encoded in the Standard Generalized Markup Language (SGML). In [9] a study has been conducted to evaluate the feasibility of an NLP-based monitoring system to screen for healthcare-associated pneumonia in neonates. In [10] by using natural language processing (NLP) the effects of using the Specialist Lexicon to improve noun phrase identification within clinical radiology documents has been investigated. In [11] a combination of regular expression matching with grammatical parsing has been proposed. In [12] natural language processing techniques have been applied to extract the medical findings and their attributes from the free text and output them in structured form using medical lexicons. In [13] a method has been proposed that searches the structured reports and images in the database and returns the one that much the structured query. There are two modes of retrieval: exact mach and partial mach. Under the exact mach mode only the reports containing exactly the same findings are retrieved and partial mach returns similar findings. The implemented system in [14] provides keyword-based and semantic-driven data matching methodology to extract the specific information from the textual clinical documents. The matching methodology provides the capabilities to recognize the selected keywords and the related semantics in the documents. Through the

Corresponding Author 40

International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823 extraction verification interface, clinicians can extract and verify the matched information semiautomatically. In [15] Hybrid Lung Nodule Detection (HLND) system based on artificial neural network architecture and interactive knowledge-based system has been developed for object detection in noisy image environments. It describes the system architecture and its application to detection and classification of nodules in lung cancerous pulmonary radiology. In [16] the proposed approach focuses on the language domain of mammography reports. In the report, radiologists describe the features or structures that they see or do not see in the image. Essentially, this report is meta-data that is written by a human subject matter expert about the image. In order to effectively train a computer-assisted detection (CAD) system, these reports could be mined and used as supplemental meta-data. This work describes an approach to learning cue phrase patterns in radiology reports that utilizes a genetic algorithm (GA) as the learning method. A method has been proposed in [17] as a sequence alignment method RADICAL (RADIology Content ALignment), that uses dynamic programming to efficiently extract templates that are common across sets of reports. Research works and designed tools in a structured radiology report can be divided into two categories: Using the tool for entering data structured, which is usually done through predetermined menus and entering radiology reports unstructured using text mining techniques, artificial intelligence or using patterns. In this article, a structuring method is proposed using existing concepts in unstructured textual radiology reports. The rest of the paper has been organized as follows. In Section 2 we describe the patterns and their properties. In Section 3 we discuss the item in detail. In Section 4 we present our experimental results. Finally, Section 5 concludes the paper.

2. PATTTERN
Analyzing radiology findings showed them to be complex arrangements of basic medical concepts. The possible permutations of radiology findings suggest that enumerating them would be impractical. However, the relations between concepts in each radiology findings appear to be of a relatively small number, and the clinical content of radiology findings may be adequately structured [18], [19]. The proposed pattern in this article is based on existing concepts in each radiology report and with combining these concepts it can cover existing complex combination in radiology reports. In other words, instead of studying complex combinations and try to separate these relations, we determined simple relations between concepts and by combining these concepts we specify complex combination. We call these concepts as patterns. Determining patterns is done manually after analyzing radiology reports. Some determined patterns of radiology reports are as follows: Heart size is normal. Cervical spine is normal. Bilateral cervical rib was seen. The size and thickness of the cortex in both kidneys is normal. Lung parenchyma is abnormal. As mentioned, the pattern is representing conceptually in radiology textual report. Patterns are combinations of simpler concepts. Simple concepts cannot be simplified further. In this article these concepts of atomic called the pattern item. In fact, with combining the pattern items, patterns can be created. For example, the pattern heart size is normal creates four pattern items : accept, size, heart and normal. Determining pattern items is done manually after analyzing radiology report. Pattern items have properties as follows. Original Property If a pattern item represents an original concept of a pattern, this property is assigned to the pattern item. For example, the pattern Bilateral cervical rib was seen, with study will be noticed that this pattern is related to cervical, rib. As a result, two pattern items: rib and cervical have original properties. The pattern can be attributed to the radiology reports, that radiology reports are including all original pattern items. At Least One Original Property In observed the size and thickness of the cortex in both kidneys is normal, the original concept is about the kidneys, thickness of the cortex, size of the cortex. If at least mentioned one of the pattern items thickness of the cortex or size of the cortex, the pattern can be attributed to radiology reports. In this case, the two pattern items, thickness of the cortex and size of the cortex is assigned At Least One Original property so if at least one of mentioned items, there is the possibility of pattern matching. Note that the kidneys item has an original property 41

International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823 and if the reportdoes not contain these concepts, pattern can not be attributed to the radiology reports. Can Delete Property Some pattern items help us to better understand the concepts. By removing these pattern items there is no difference between decision-making and allocation pattern. For example, in pattern Bilateral cervical rib was seen, by removing Bilateral, the original concept can be inferred and despite Bilateral can be better inferred.

3. ITEM
The facts consist of domain specific terms [23]. A term is usually a word but it can also be word-pair or phrase [24]. Existing items in pattern items are words or word-pairs or a set of words and words-pair. Expression concepts in radiology reports are extracted by using a set of items, so one of the routine work for improving the pattern mapping is the use of polysemy and synonymy. Polysemy is a word that has multiple words and has multiple meanings. Synonyms are multiple words having the same meaning and this was done by creating a relation between the items. The item text radiology reports can be produced from combination of several languages and this can cause problems when supporting multilingual. Research on multilingual and its problems has been done in [25], [26]. Ofcourse, concepts of radiology are specific and radiologists express these shared concepts with different languages and expressions. Our proposed pattern in this paper is based on concepts of radiology, Therefore, we consider just one pattern for each concept in radiology and for different languages, related items are considered for desired language.

4. PATTERN MAPPING
As mentioned earlier, the patterns represent concepts. So if a pattern can be attributed to a report, that report contains the concepts of the pattern. This is done with pattern item searching in the report using pattern item properties. For adapting pattern item with radiology reports we use of Naive Bayes Classifier. We explain the probabilities that are intended for pattern items in Naive Bayes Classifier: A. Original Property

Considering that these pattern items should be observed in the pattern so the probability intended for the original item is obtained from equations (1) and (2). PA represents pattern and PO represents pattern item that has the original property. If the original pattern item PO is intended in pattern PA, and in radiology reports is available Equation (1) is used, otherwise Equation (2) is used.
P( PO | PA) = 1 P (PO | PA) = 0

(1) (2)

B.

At Least One Original Property

The probability of At Least One Original Property is obtained from Equations (3) and (4). PA represents the pattern and AO represents pattern item At Least One Original Property in pattern PA. AOCount is the number of existing pattern items in pattern that has At Least One Original Property.
2 AOCount 1 2 AOCount 2 AOCount 1 2 AOCount

P( AO | PA) =

(3)

P (AO | PA) = 1

(4)

C.

Can Delete Property

If the pattern item is not observed, the probability of presence or absence is obtained form Equation (5). PA represents the pattern and DO represents a pattern item that has Can Delete Property in pattern PA.
P ( DO | PA) = P (DO | PA) = 0.5

(5) 42

International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823

4. EXPERIMETAL RESULTS
214 radiology reports related to chest were used from medical center of Isfahan, Iran for experimetnts. 80 percent of reports were used for training and 20 percent for test. Determining of patterns and pattern items has been done manually after analyzing the radiology reports. The results of the training are listed in Table 1 and the results of test are listed in Table 2.

Table 1: The Results Of Training

Observation Radiology Report Total Pattern Total Pattern Item Average Pattern Item For Patterns Average Pattern Item For Reports

Quantity 171 94 858 9.13 5.2

Table 2: The Results Of Testing Proposed Pattern

Observation Radiology Report Total Pattern Total Discovered Pattern Total Undiscovered Pattern

Quantity 22 136 126 10

For comparing the result of the proposed pattern in this article with Naive Bayes Classifier, 80% of data for training and 20% were used for testing the Naive Bayes Classifier. The result of studying Naive Bayes Classifier is listed in Table 3.
Table 3: The Result Of Testing Naive Bayes Classifier

Observation Radiology Report Total Pattern Total Discovered Pattern Total Undiscovered Pattern

Quantity 22 136 122 14

The precision criteria is obtained from Equation (6). TP represents the number of patterns that have been mapped correctly, FP is the number of patterns that have not been mapped correctly.
Pr ecision = TP TP + FP

(6)

Based on the information in Table 2, the parameters are as follows: TP=126 43

International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823 FP=10 Therefore, the Precision is 0.9265. Considering Table 3, the value of parameters is as follows: TP=122 FP=14 and thus the precision when using the Naive Bayes Classifier is 0.8971. Figure 1 compares the precision of the proposed approach with similar works. As it can be seen from the figure, the proposed approach has a higher precision than previous works.

Figure 1: Compare the results of similar works with proposed pattern

5. Conclusion and Future Research


In this paper we prpopsed a new approach for structuring radiology reports which by distinguishing items and their rpoperties maps them to determined structures. Our experimental results show that the proposed approach is more accurate than similar methods. We are currently working on determining items automatically in the reports.

Acknowledgment
Authors thank from valuable guidance of Mr. Mohammad Reza Shirvani, radiology expert of Medical Sciences University, Tabriz, iran and from Mr. Sadegh Marani, radiology expert and director of the Kosar clinic in Isfahan, Iran.

References
[1] [2] N. Knight, L. Eliot, Siegel, Radiology Reporting, Past, Present, and Future, the Journal of the American College of Radiology, May 2007, pp. 313-319. S. Carrell, L. Miglioretti, R. Bindman, Using Natural Language Processing to Explore Schemes for Assessing Disease Risk from Free-Text Radiology Reports, Clinical Medicine & Research, December 2001,pp. 148-149. R. Noumeir, Radiology interpretation process modeling, Journal of Biomedical Informatics , vol. 39, April 2006, pp. 103-114. S. Shanmuganathan, N. Ghotbi, Text Mining of Medical Records for Radiodiagnostic Decision-Making, Journal of Computers, vol. 3 , January 2008. T. Gong , C. Lim Tan, T. Yun Leong, C. Kiang Lee, B. Chuan Pang, T. Lim, Q. Tian, S. Tang, Z. Zhang, Text Mining in Radiology Reports, Eighth IEEE International Conference on e-Science, February 2009, pp. 815 820. K. Prasad, R. Ramakrishna, S. Kumar, P. Rani, Extraction of Radiology Reports using Text mining, International Journal on Computer Science and Engineering ,vol. 2, 2010, pp. 1558-1562. G. D. Berman, R. N. Gray, D. Liu, J. J. Tyhurst, " Structured radiology reporting: a 4-year case study of 160,000 reports," Integrating the Healthcare Enterprise (IHE) Symposium of the Radiological Society of North America (RSNA), Nov 2001, pp. 1-12. C. E. Kahn, Jr, "Self-documenting structured reports using open information standards," Medinfo, 1998, 44

[3] [4] [5]

[6] [7]

[8]

International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823 [9] pp. 403407. E. A. Mendonc J. Haas, L. Shagina, E. Larson, C. Friedman, "Extracting information on pneumonia in infants using natural language processing of radiology reports," Journal of Biomedical Informatics, vol. 38, August 2005, pp. 314321. Y. Huang, H. J. Lowe, D. Klein, R. J. Cucina, "Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon," Journal of the American Medical Informatics Association, vol. 12, June 2005, pp. 275285. Y. Huang, H. J. Lowe , "A Novel Hybrid Approach to Automated Negation Detection in Clinical Radiology Reports," Journal of the American Medical Informatics Association, vol. 14, June 2007, pp. 304311. T. Gong, C. L. Tan, T. Y. Leong, C. K. Lee, B. C. Pang, C.C. Tchoyoson Lim, Q. Tian, S. Tang, Z. Zhang, "Text Mining in Radiology Reports," IEEE International Conference on Data Mining 2008 (ICDM 08),Dec 2008, pp. 815-820. K. Prasad, S. Ramakrishna, S. Kumar, P. Rani, "Extraction of Radiology Reports using Text mining," International Journal on Computer Science and Engineering, vol. 2, 2010 , pp. 1558-1562. C. H. Chen, X. O. Ping, Z. J. Wang, S. L. Hsieh, L. Chin Chen, Y. J. Tseng, C. W. Hsu, F. Lai, "The keyword-based and semantic-driven data matching approach for assisting structuralizing the textual clinical documents," Biomedical Engineering and Informatics (BMEI), 3rd International Conference ,vol. 6, Oct 2010, pp. 2532- 2535. Y.S.P. Chiou, F.Y.M. Lure, P.A. Ligomenides, "Neural-knowledge base object detection in Hybrid Lung Nodule Detection (HLND) system," IEEE World Congress on Computational Intelligence, 1994 IEEE International Conference, vol. 7, Jul 1994, pp. 4180-4185. R. M. Patton, B.G. Beckerman, T. E. Potok, " Learning cue phrase patterns from radiology reports using a genetic algorithm," Biomedical Science & Engineering Conference, Mar 2009, pp. 1-4. W. Shengyang, C.P. Langlotz, P. Lakhani, L.H. Ungar, " Extracting templates from radiology reports using sequence alignment," Bioinformatics and Biomedicine Workshop IEEE International Conference, Nov 2009, pp. 320-324. C. Friedman, J. J. Cimin, S. Johnson, "A conceptual model for clinical radiology reports," Proceedings of the 17th Annual Symposium on Computer Applications in Medical Care, 1993, pp. 829-833. C. Friedman, J. Cimino, S. Johnson, "A schema for representing medical language applied to clinical radiology," Journal of the American Medical Informatics Association, vol. 1, Jun 1994, pp. 233-248. J. F. Sowa, "Conceptual Structures," Addison Wesley, Reading, Mass, 1984. J. J. Cimino, G. Hripcsak, S. B. Johnson, Clayton, "Designing an Introspective, Multipurpose, Controlled Medical Vocabulary," Proceedings of the Thirteenth Annual Symposium on Computer Applications in Medical Care. New York: IEEE Computer Society Press, 1989, pp. 513-518 N. Zhong, Y. Li, S. T. Wu, "Effective Pattern Discovery for Text Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 24, January 2012,pp. 30-44. Q. Qiang, Q. Jiangnan, S. Chenyan, W. Yanzhang, "Graph-Based Knowledge Representation Model and Pattern Retrieval," IEEE Computer Society (FSKD 5), vol. 5, Oct 2008, pp. 541-545. Solka, L. Jeffrey, "Text data mining: Theory and Methods", Statistics Surveys, vol. 2, July 2008, pp. 94112. H. S. Oh, J. B. Kim, S. H. Myaeng, "Extracting Targets and Attributes of Medical Findings from Radiology Reports in a mixture of Languages," ACM Conference on Bioinformatics, Computational Biology and Biomedicine , 2011, pp. 550-552. K. Miyoung, J. Sungwon, C. Jinwook, " Information extraction from radiology reports mingled two languages," 7th International Workshop, Jun 2005, pp. 218-223. I. Solti, C. R. Cooke, F. Xia, M. M. Wurfel, "Automated Classification of Radiology Reports for Acute Lung Injury: Comparison of Keyword and Machine Learning Based Natural Language Processing Approaches," IEEE International Conference Bioinformatics Biomed, November 2009, pp. 314-319. J. Friedlin, M. Mahoui, J. Jones, P. Jamieson, "Knowledge Discovery and Data Mining of Free Text Radiology Reports," First IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology, 2011, pp. 89-96.

[10]

[11] [12]

[13] [14]

[15]

[16] [17]

[18] [19] [20] [21]

[22] [23] [24] [25]

[26] [27]

[28]

45

Вам также может понравиться