Вы находитесь на странице: 1из 5

THIE SYNCHRONIZATION OF DISTRIBUTED DATABASES

W. Ed Hammond, Mark J. Straube, and W. W. Stead Duke University Medical Center Box 2914 Durham, NC 27710 Telephone (919) 684-6421 E-mail HAMMOOOl@dukemc FAX (919) 684-8675

ABSTRACT
This paper describes, with examples from a real-world medical environment, major problems of employing distributed, replicated databases for clinical care information. Specific problems discussed include patient identification, synchronization of data transfer among systems relating to clock time and event, distributed concurrency control, and system unavailability. Solutions, where available, are presented.
The Hospital Information System of the 90's appears to be heading in the direction of distributed databases [1,2]. These local databases will contain both data of only local interest and data shared with other locations through a central master database. The master database, which may be real or virtual, is created from data received from the various distributed systems [3].

network.

This paper describes the nature of the data flow among the TMR and DHIS systems and identifies some of the problems and issues which we have encountered. We present our solutions to these problem along with our rationale for a particular approach. Patient Identification

Duke University Medical Center (DUMC) has two primary information systems. The Duke Hospital Information System (DHIS) is a transaction-oriented system which supports the service-related activities, such as Admission/Discharge/Transfer and Order Entry and Result Reporting, for the Duke Hospital. The Medical Record (TMR) [4] is a comprehensive, computer-based medical record system which functions as the primary information system at a number of settings at Duke, including the Surgical Intensive Care Unit, the Division of Cardiology, the Division of Obstetrics, and the Division of Community Medicine. In addition, TMR serves as the scheduling system for both the Medical and Surgical Private Diagnostic Clinics (PDC). As part of an LAIMS model-testing and implementation project, DUMC has made progress in coordinating applications based upon these two systems into an integrated database. Over the last two years, we have installed a fiber optic backbone which connects most of the buildings within the medical center complex. A number of computers are connected to this network including the IBM 3090-300, which supports DHIS, and several Digital Equipment Corporation VAX computers, which support the various local TMR systems. The networking protocol is TCP/IP based on the Ethernet network model. Data is exchanged among most of the systems connected to the

Establishing patient identity in one database. A primary concern of any medical database is the correct identity of the patient. The problem begins with the first patient contact [5]. Within TMR, the person collecting the data, called a data terminal operator (DTO), either types in the patient's identification number (PID) or some part of the patient's name in last name/first name order. An immediate problem occurs in the consistency of recording the patient's name. Most systems record the patient's last name, first name and middle initial. With very popular names, we have observed as many as 20 identical first name, middle initial, last name sets. The suffix, such as Jr., Sr., II, m, etc. may be entered with the last name, with the first name or not at all. If the patient is not physically present, the DTO is limited to the data available. Additional complications result from nicknames and when persons use their first initial and middle name. These problems can be minimized by capturing the name components separately, particularly the suffix, and by capturing full names. Nicknames and aliases should also be captured.
Within TMR, the names are stored in a name-index file as a B-tree, along with additional data to establish uniquely the patient's identity. This additional data includes date of birth, sex, mother's maiden name, and social security number (SSN). Previous names should be retained with the index and be linked to the current entry.

Various TMR distributed databases may use different schemes for the PID. Some settings use the social security number; others use the Duke History Number (HX), a letter followed by 5 sequentially assigned digits; others a sequentially assigned number; or others a family number, a two-digit family position number and a five digit family number. In all cases, if no number currently exists, a

0195-4210/90/0000/0345$01.00 1990 SCAMC, Inc.

345

computer will automatically assign a number in the proper format.


And so, at the initial patient contact, the DTO enters the patient name or identifier, and the index is scanned for a match. We suggest the entry be limited to last name and first initial, and in some cases, the computer program forces this rule for look-up. For name entry, the B-tree is searched for matches. Those 'hits" are displayed along with the additional identifying data. The D10O then selects the appropriate patient or creates a new entry. If the patient contact is over the telephone, misspellings of the patient's name increases over direct contact. A Soundex index will increase the probabflity of finding a patient which already exists in the database but does not impact spelling effors on initial entry. The patient should always be asked to spell the name when creating the initial record.
If the DTO enters the PID, a direct lookup for the PID occurs. If the PID is present, the patient's name and amplifying data is displayed for acceptance. If the PID is not found, a new record is created. In that case, when the name is entered, the program automatically rescans the name index for a duplication. If matches are found, the DTO is asked to verify that, in fact, this is a new entry to the database.

database. The mismatches appeared as new patients.

Even with these precautions, our existing TMR databases have between 10 to 20% obvious duplications. Many of these duplications result from variations in the spelling of the name, from variations in the format of the name, in suffix and no suffix and from transition of digits in the PID. The transition of digits can be virtually eliintd through the addition of a check digit to the PID. In many cases, carelessness on the part of the DTO is a direct cause of duplication. This effor can be coffected by having separate, independent verification of-the patient identifying data set at the time of initial data entry.

Maintaining an audit trail of name changes and PIID changes is extremely important when data is transfeffed between systems with lapsed time between contacts. For example, ff the PIID changes between a laboratory requisition and the result reporting, that linkage is necessaxy to identifyr the patient. Common p2atient index, The duplication of records and the overhead of human intervention suggest another approach is necessary. The first step is to establish an institution-wide common patient index (CPI) which serves as the master for all patients who have been or will be seen within the Duke setting. This index requires a change in DUMC policy. In the past, patient IDs (history numbers) were not assigned until either the patient or a piece of paper about the patient was physically on-site. This policy increased identification accuracy and avoided 'wasting" numbers. As a result, each service and local system had to develop temporary patient identification numbers to handle the patient before the institution assi"gned a history number. All new patients will be first looked up in the CPI, and, iff present, appropriate data will be downloaded into the local database and a local record established. The CPI will be aware of the local record so that updates can be passed along. If the CPI does not contain the new patient's record, data will be collected at the local setting and passed to the
CPl.
A temporary ID will be assigned by the CPI and passed back to the local database. The patient data will be verified independently by re-establishing contact with the patient. Special care will be taken to insujre completeness of the patient names and the coffect spelling, as well as the coffect address. After validation, the ID will become permanent and all local databases will be informed. It wRil be the responsibility of a local database to pass ID coffections to any other electronic database with which it has shared the patient ID and demographic data. Updates to demographic data will be handled in a similar fashion with data collected in local settings, passed to the CPI, validated, and redistributed to all appropriate local databases. Since this process is in an implementation stage, time will tell its effectiveness.

Synchronization of patient identity across databases, If patient data is to be transmitted for inclusion into another database, positive identity becomes even more critical. Since the databases had been established independently, initially whenever data was exchanged between systems, the patients had to be matched by the computers from available data. The problem was complicated when different forms of the patient ID were used. For example, DHIS uses the Duke History Number (HX) as the primary ID but also contains a field for the SSN. The PDC database uses the SSN as the primary ID and the HX as a secondary ID. The secondary ID was never collected as carefully as the primary ID. The matching algorithm used the proper patient ID, the last name, the first name, the middle initial (even ff the entire middle name was present), and the patient's date-of-birth. A full match occurred less than half of the time. The date of birth was an interesting field often differing a day or two only; other times the month and occasionally in the year were diffent. Human dynamics were the probable cause - a distraught mother in an emergency room or someone slightly untruthful about their age. Last names change for many reasons - marriage, to avoid payment, to remain anonymous and must be tracked over time. The best we were able to do with various matchings were about 60-70% of the

The CPI can also be used to keep the distributed databases informed of patient activities in other areas. For example, the Di'vision of Cardiology follows all patients who have had a catheterization performed at Duke. If that patient is seen elsewhere at Duke, Cardiology might wish to be informed or have some data automatically down-loaded into their local database. Clearly if the patient died at Duke, Cardiology would like to be informed. The CPI also permits well-defined control of the privacy and security of data through data filters defined within the CPI.
Svnhronization of Data Entrv Whenever local databanks receive data from other sources, the input of that data must be done in the order in which the data gzeneration occuffed MI] Wiederhold and

Qian suggest an excellent model (7] for dealing with problems of synchronization in distributed databases. The first and most obvious problem is the required synchronization of the clocks of all the systems. For example, it's more than embarrassing if lab data is timestamped five minutes after a patient has been recorded as dead. Synchronization of multiple clocks is difficult. One clock must be made the master clock and conduct real-time, synchronized downloads to each systemn, probably daily. At the present time, the clocks are synchronized manually and
drift apart, even among the Vaxes. Even though the data entries are made on one system sequentially, due to internal queuing, they may be transmitted out of sequence to another system. For example, an admit to a bed may be received in TMR before a discharge of another patient from that same bed, even though DHIES forces the correct sequence on the DFHS side. The interface program to TMR buffers any data received out-of-sequence and tries to reprocess the buffered data after any subsequent new data processing. Our experience has been that this procedure is adequate, and the musqueuing
coffects itself.

to verify automatically the data later using a batch program. For batch transfers based on a polling approach, a status bit is returned saying a system is not available, and the recei'ving systems enters a wait state and retries after a specified delay. If the rece'ivn system is unavailable, the sending system writes the data to its own disk where it may be later retrieved by the receiving system when it returns to an avaiable-state.

A second problem in the data insertion process occurs when the receiving record is locked for update from another source. The data must again be buffered, but, in this case, this data must be entered into the record before any additionai data can be added. Further data entry must be queued and processed in sequence. For example, if lab data is sent from DHIS to TMR, and the record is locked on the TMR side, the DHIS data must be buffered. Any additionai data arriv'ing from DHIS for this patient must be queued for insertion after the previous entries until all data is processed.

Frequently knowledge of the tasks to be performed For example, in the interaction for lab orders and result entry between TMR on one computer and a laboratory system (TLS) on another computer, the down-time required for the nightly backup can be anticipated. Preorders for the next day are sent from the TMR computer before backup and are buffered on the TLS computer. After midnight, the preorders are activated by TLS. If the TMR computer goes down unexpectedly, work can continue by entering the orders directly into TLS. When the TMR computer comes back on line, the orders are transmitted to TMR and are automatically entered into the TMR record, aiong with the results when they are available. A message informing the user of the order entry is printed at a printer at the normally ordering site.
can be built into the two systems to optimize the interaction.

Dynamic Interactions Between Systems


In one application currently under development at Duke, diagnostic orders entered into DHIfS are transmitted to TMR for scheduling. hn TMR, the study orders are stored in a fixed-length, direct accessed file, sorted by test and by location. The orders are transmitted from DHIS as entered in reai-time. When scheduling the orders, the patients' names are listed on the left side of the screen as shown in Figure 1.

maintained in TMR, using data transmitted from DHIS and from data entry directly into TMR. This list is available in both TMR and on DHIS. The DHIS copy is updated only twice per day, and a user may look at the TMR copy one time and the DHIS copy another and become concerned that the lists are different. Unfortunately, confidence in the system is eroded by this perception of error. Even though we print the time of update, this added information does little to solve the problem. Synchronization of Hardware Up-time
As long as computer systems do not guarantee continuous up-time, the transfer of data between systems must be done in such a manner that each system ca continue to work while another system is down, but that transfer will catch up when both systems are up. We support three modes of network interaction that meet this requirement. In the reai-time, interactive data transfer which is requested by a user, for example, a demographic data download, we return to the user program a status which indicates the other system is not available. The user program then must decide what action to take. Frequently, the data is entered manually, and a flag is set for the system

Another problem relating to timing occurs when access to more than one system does not show the exact same status. In one application, an hnpatient List is

SC-HEDULING ECHO FOR 06/10/90

T39441'Dog, Pluto N63 D86762 Coyote, Willie N8I K38515 Sam, Yosemite N73

Duck,affy M5 T9i1i Mouse, Mickey RN7 T91i41 Mouse, Mickey RAN

OSOlYum DairylhPie,Tweetl 0900lCow, Elsie 18ull, Ferdin IMouse, Jerryl1


0930 Cat, Tom

eo

reat e

Pecker, Wood I Worth, Nary

jO3OlKettle, MaI Rubble, Barn I


IM
1

~tr
II II

lIe

T93090 MoaPe, Abner Fifie T77620 Yokum,


-T36049 Tracy, Dick

N71 12301I
N73

T43966 Cornpone, 3_A3il N5i T36049 Warbucks, Daddy N62

T40689 Sam, Marryirg

N43

10001 I 13301 I
14301 I

14001 I

II

N52

15001 16001

II

16301
Select option <S#,D#,

II

F,B,C,Q>->
FIGURE 1

Scheduling diagnostic studies for Cardiology Diagnostic Unit. A resource may be booked by selecting a patient, a resource and a time.

The available resources and currently-scheduled patients are shown on the right. TMR shows a group of patients on the screen. In the meantime, new information may have been received from DHIS which changes the order of patients and the counts. As one pages from one screen to another, the list and order of patients may dynamically change. Users may become confused. To avoid this problem, we suspend transfer of newly entered orders while scheduling occurs. One additional problem occurs when a study is scheduled and subsequently cancelled from DHIS. In this case, the cancellation must delete the scheduling data from the appointment system and from the patient's record.

Dictionary Synchronization
At the present time, the various systems using TMR at Duke use separate dictionaries. The advantage of this approach is that each user group controls the content and the timing of additions, updates, and deletions. The disadvantage lies in trying to map items from one system to another. At the present time, translation tables are required when sending data from one system to another. In one application, appointment data is sent from Cardiology Associates to the Medical PDC appointment system for coordination of all appointments. The systems have different dictionaries and therefore different codes for providers, clinics, studies, referring physicians, diagnoses, etc. Translation tables must be kept up-to-date and the turn-on of new entries must be synchronized across systems. Time schedules, clinic schedules, and provider schedules must be redundantly entered into both systems.
Another application relating to dictionary synchronization arose with the referring physician database. The Duke Medical Center, as part of an extensive outreach program, captures data on one to four referring physicians for each referral or admission to Duke. Among other uses, information concerning the patient's care at Duke is automatically sent to the referring physician as it becomes available. Since patients flow throughout the Duke setting, it was necessary to establish a common referring physician database, currently available through DHIS. Duke's approach was to purchase a commercial database which lists all physicians along with addresses and specialities for North Carolina and 5 neighboring states. The number of physicians in the database is approximately 70,000 and includes many physicians who will never send patients to Duke. Within the TMR systems, the options were to copy the entire database and make it part of each users dictionary; to request information at each reference to the referring physician across the network; or to establish a local subset of the database. The first option was eliminated because of expense in both storage and search time. The second option was eliminated because of the frequent reference to the referring physician, the time, although very fast, of accessing that data across the network, and the dependence of a second system being available. The option selected established local referring physician databases which were synchronized and validated through a master referring physician database.
Within a TMR system, the local dictionary is searched

for the referring physician. If that name is not found locally, the master database is queried. If the name is found, that data is downloaded and added to the local dictionary, creating a new code in that dictionary and making the appropriate entry into the translation table. If the name is not found, an entry is made in the local dictionary and the data is transmitted at the same time to the master database. A master number is assigned and the data is flagged as not validated. Later, the master data management group independently verifies the data. Notification of verification, along with any corrections, are transmitted to all users where local dictionaries are updated. If the temporary entry was in fact already in the database, the assigned number is inactivated, and appropriate users are informed of the error and provided the correct referring physician identification.

entries. This database had been carefully maintained and In creating the master referring physician database, the Cardiology database was merged into the purchased database. The computer match was zero. Failure to match was a result of different formats for names, a few misspellings, vastly different abbreviations, and different addresses - office addresses compared to home addresses. The databases were manually merged with over 80% of the referring physicians actually being in both databases. In addition, the purchased database contained an estimated 20% duplication of names.
was assumed to be accurate.

The Division of Cardiology had established a referring physician database which contained approximately 4500

included incomplete name and address data, syntax errors, inconsistency in format, misunderstanding of how the system worked, and carelessness on the part of the DTO. Over 150 names were verified daily. Over 90% of these were actually already in the master dictionary. After 3 months of operation, the actual verifications are around 30 per day.
Less than half of these are true new entries.

name created a new entry in spite of the fact that the names were included in the master dictionary. Early problems

Upon the initial startup of the system, almost every

Conclusions

The most important component in synchronization of distributed databases is the positive identification of the patient in all systems. This requirement is best met by guaranteeing that the patient information is collected and verified at the initial contact and positively tracked by all systems subsequently. Control must be established and maintained by a single Common Patient Index.

Systems must also be sensitive to timing issues and proper synchronization of data updates to records. Systems must also not depend on computers being on-line at the same time to accomplish the data transfer.
The practical problems encountered in developing local databases which are part of a shared master database concept are difficult to solve unless the involved systems are integrated in function and concept - not just interfaced. Successful interaction requires an understanding of function, procedures, timing and operational characteristics. On the

348

other hand, the benefits of the approach are proving to be most valuable.

References
1. Wilton RW and McCoy JM: An Outpatient Clinic Information System Based on Distributed Database Technology. Proceedings Thirteenth Symposium on Computer Applications in Medical Care, edited by Kingsland LC, New York, New York, IEEE, pp. 372-376, 1989.

Margulies DM, Ribitzy R, Elkowitz A, McCallie DP: Implementing an Integrated Hospital Information System at Children's Hospital. Proceedings Thirteenth Symposium on Computer Applications in Medical Care, edited by Kingsland, LC, New York, New York, IEEE, pp. 627-631, 1989.
2.
3. Stead WW, Hammond WE: Computer-based Medical Records: The Need for Storing a Single Datum in Multiple Orientations. Proceedings Twelfth Symposium on Computer Applications in Medical Care, edited by Greenes RA, New York, New York, IEEE, pp. 625-629, 1988. 4. Stead WW, Hammond WE: Computer-Based Medical Records: The Centerpiece of TMR. MD Computing 5 (5):48-62, 1988.

5. Straube MJ, Hammond WE, Stead WW: Realtime Information Exchange Between Heterogeneous Database Systems: Design Strategy and Lessons Learned During Implementation. Proceedings AAMSI Congress 1988, edited by Hammond WE, Washington, DC, American Association for Medical Systems and Informatics, pp. 202-211, 1989.
6.

Informatics, pp. 277-281, 1989. 7. Wiederhold G, Qian X-L: Modeling Asynchrony in Distributed Databases. IEEE Data Engineering Conference 3, Los Angeles, Feb. 1987.
Acknowledgements
This work was supported in part by the National Library of Medicine Grant G08LM04613, awarded by the National Institutes of Health, Department of Health and Human Services.

Networking Data Sources for Intensive Care Monitoring. Proceedings AAMSI Congress 1989, edited Hammond WE, Washington, DC, American Association for Medical Systems and

Hammond WE, Grewal R, Straube MJ, Stead WW:

349

Вам также может понравиться