Вы находитесь на странице: 1из 7

Histopathology 1999, 35, 461–467

A neural network approach to the biopsy diagnosis of early


acute renal transplant rejection
P N Furness, J Levesley,1 Z Luo,1 N Taub,2 J I Kazi,3 W D Bates4 & M L Nicholson5
Departments of Pathology and 5Surgery, Leicester General Hospital; 1Department of Mathematics and Computer Science,
University of Leicester; 2Department of Epidemiology & Public Health, University of Leicester, UK; 3Department of
Pathology, Sindh Institute of Urology and Transplantation, Karachi, Pakistan; and 4Department of Pathology, John Radcliffe
Hospital, Oxford, UK

Date of submission 8 May 1998


Accepted for publication 25 April 1999

Furness P N, Levesley J, Luo Z, Taub N, Kazi J I, Bates W D & Nicholson M L


(1999) Histopathology 35, 461–467
A neural network approach to the biopsy diagnosis of early acute renal transplant rejection

Aims: To develop and test a neural network to assist in performance with either of the other sets was poor.
the histological diagnosis of early acute renal allograft However, if either of the ‘difficult’ sets was added to the
rejection. training group, testing with the other ‘difficult’ group
Methods and results: We used three sets of biopsies to improved dramatically; 19 of the 21 ‘Banff ’ study cases
train and test the network: 100 ‘routine’ biopsies from were diagnosed correctly. This was achieved using
Leicester; 21 selected difficult biopsies which had observations made by a trainee pathologist. The result
already been evaluated by most of the renal transplant is better than was achieved by any of the many
pathologists in the UK, in a study of the Banff experienced pathologists who had previously seen
classification of allograft pathology and 25 cases these biopsies (maximum 18/21 correct), and is
which had been classified as ‘borderline’ according to considerably better than that achieved by using logistic
the Banff classification in a review of transplant biopsies regression with the same data.
from Oxford. The correct diagnosis for each biopsy was Conclusion: A neural network can provide a considerable
defined by careful retrospective clinical review. Biopsies improvement in the diagnosis of early acute allograft
where this review did not provide a clear diagnosis were rejection, though further development work will be
excluded. Each biopsy was graded for 12 histological needed before this becomes a routine diagnostic tool.
features and the data was entered into a simple single The selection of cases used to train the network is
layer perception network, designed using the MATLAB crucial to the quality of its performance. There is scope
neural network toolbox. Results were compared with to improve the system further by incorporating clinical
logistic regression using the same data, and with information. Other related areas where this approach is
‘conventional’ histological diagnosis. If the network likely to be of value are discussed.
was trained only with the 100 ‘routine’ cases, its
Keywords: allograft, Banff, biopsy, neural network, rejection

Introduction may be detected, but it is not at all specific; there


are many causes of impaired renal function,
Acute renal allograft rejection requires a rapid including excessive levels of some immunosuppressive
increase in immunosuppression if the graft is not to drugs. Furthermore, in grafts with delayed primary
be lost; but in the early stages of rejection the diagnosis function due to ischaemic damage, serum creatinine is
is often difficult.1 An increase in serum creatinine inevitably elevated and so provides no indication of
Address for correspondence: Dr P N Furness, Department of Pathology,
rejection. Alternative approaches to the diagnosis of
Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK. rejection, including fine needle aspiration2 and urine
e-mail: pnf1@le.ac.uk cytology3 have not proved popular, and the main
q 1999 Blackwell Science Limited.
462 P N Furness et al.

approach remains histological assessment of a needle each histological feature. We therefore sought to
biopsy. develop a simple neural network which might assist
Unfortunately the histological changes of acute the decision-making process in the diagnosis of acute
rejection develop gradually over hours and days, and renal transplant rejection.
in early cases (which are most amenable to treatment)
the diagnosis can be very difficult. Furthermore, some
histological features of acute rejection can be seen in Materials and methods
protocol biopsies from stable grafts.4
The development of the Banff classification of renal H I S T O L O G I C A L F E AT U R E S E X A M I N E D
transplant pathology5 raised hopes that procedures for A standard set of histological features was recorded
the evaluation of renal transplant biopsies would be for each biopsy by a single observer. The list of
harmonized and that the accuracy of diagnosis of acute features is given in Table 1. Where a ‘Banff ’ definition
rejection would be improved. A recent study of the Banff of a feature was available, we used that definition.
scheme showed that the former aim has been achieved, For other features, we devised our own grading
but the latter has not.6 Twenty-one carefully selected system as shown in Table 1, after review of the
‘difficult’ transplant biopsies were circulated around the range of appearances to be seen in the biopsies to be
majority of pathologists who report renal transplant studied.
biopsies in the UK. Using the Banff classification
produced more consistent diagnoses than a conven-
tional approach, but the number of correct diagnoses CLINICAL CASES
(as judged against a retrospective clinical review) was
not improved. In a study of this type it is essential to define how the
We argued that this disappointing result arose correct diagnosis is identified. We used a retrospective
because in the diagnosis of early acute rejection, the review of the clinical casenotes. Rejection was defined as
Banff classification concentrates on just one feature, a rise of serum creatinine of at least 15% in the week
‘tubulitis’. Other features which have been informally preceding biopsy followed either by a fall to within 5% of
considered during a ‘conventional’ evaluation of a baseline within 7 days of treatment, or by progression to
transplant biopsy are ignored. loss of the graft by rejection. Cases where such changes
The human brain is quite good at integrating could have been due to changes in hydration were
disparate piece of information to come to a decision in excluded. Non-rejection was defined as a ‘protocol’
an informal way, but it is not consistent. To add more biopsy with a change of serum creatinine of less than
histological features and ask pathologists to calculate 5% in the week following biopsy with no change in
the probability of rejection in a systematic, reproducible immunosuppression, or a rise in creatinine which was
manner would be impractical. Furthermore, there is clearly identifiable as a consequence of another condi-
little more than ‘expert opinion’ to indicate what weight tion, and which responded to treatment of that
should be attributed to each histological feature. condition without increasing immunosuppression.6
We proposed that a computer-based system could
allow input of more varied data without losing
TRAINING THE NETWORK
reproducibility of assessment. In the context of trans-
plant rejection, this method has the advantage that To train the network, we took a sequential series of
subsequent review of the clinical history can, in most transplant biopsies from the files of the pathology
cases, provide an unequivocal diagnosis of rejection or department at Leicester General Hospital. Biopsies from
not rejection. This is invaluable in training and in grafts which had been in situ for 6 months or more were
testing such a system. We have already shown that this excluded. The details were passed to a member of the
sort of data integration can be achieved using a clinical team (MLN) who excluded cases where a
Bayesian belief network, which is a form of inference subsequent review of the clinical notes did not provide
network.7 This approach has the disadvantage of a clear diagnosis or exclusion of acute rejection. In this
relative inflexibility, as the ‘importance’ attached to way we continued through the files until we had 100
each histological feature has to be calculated and suitable cases; 43 cases of definite rejection and 57 of
programmed into the network at the onset. A neural definite ‘not rejection’. Sections were then withdrawn
network raises the possibility of greater flexibility; the from the files and the severity of all 12 features shown in
process of ‘training’ a neural network would auto- Table 1 were recorded for each biopsy by a single
matically calculate what ‘weight’ should be allocated to observer (JIK).
q 1999 Blackwell Science Ltd, Histopathology, 35, 461–467.
Neural network diagnosis of rejection 463

Table 1. Histological features

Histological feature Scoring system

Tubulitis 0–3, as in Banff classification5

Intimal arteritis 0–3, as in Banff classification

Interstitial infiltrates Percentage area of biopsy showing obvious interstitial infiltration by


lymphocytes

Oedema Percentage area of biopsy showing obvious interstitial oedema

Interstitial haemorrhage 0–3: absent; 1–25%; 26–50%; more than 50% of area

Acute glomerulitis 0–3, as in Banff classification

Activated lymphocytes 0–3: absent; 1–25% of lymphocytes; 26–50% of lymphocytes;


over 50% of lymphocytes

Venulitis 0–3: absent; scanty lymphocytes, few venules; abundant


lymphocytes, all venules

Eosinophils Number per single high-power field in most heavily infiltrated area

Plasma cells Number per single high-power field in most heavily infiltrated area

Arterial endothelial mononuclear cell adherence Present or absent (cells adherent to luminal surface of endothelium;
c/f intimal arteritis)

Venous endothelial mononuclear cell adherence Present or absent (cells adherent to luminal surface of endothelium;
c/f venulitis)

TESTING THE NETWORK by the majority of the renal transplant pathologists of


the UK, so we had a measure of how well-experienced
Testing the network with a subset of the original 100 pathologists might be expected to perform in their
cases did not prove to be taxing; complete discrimina- evaluation.6 They had also been reviewed by two
tion was possible using only two features, the weight of transplant pathologists in the laboratory of the chief
the interstitial infiltrates and the extent of the interstitial architect of the Banff classification of transplant
oedema. We attributed this to the fact that these biopsies pathology.5
were unselected, many having histologically obvious The second group of ‘difficult’ cases were selected
diagnoses. We therefore tested the network with two from the files of the Department of Cellular Pathology,
other sets of transplant biopsies which lad been selected John Radcliffe Hospital, Oxford, as part of a major
to represent difficult diagnostic problems, and which review of that unit’s transplant biopsies, undertaken by
had previously been subjected to intensive study. WDB (the ‘Oxford’ cases). They had been graded as
We used 21 selected cases which had previously been ‘borderline’ according to the Banff classification, but
used in the UK study of the Banff classification of subsequent clinical review had provided a clear
transplant pathology (the ‘Banff ’ cases). These had diagnosis of rejection (19 cases) or not rejection (four
initially been selected because they had caused diag- cases).
nostic difficulty for one of the authors (PNF), but a For all these cases, a single H & E stained section and a
retrospective clinical review had given a clear diagnosis single PAS section were studied by a single observer
in every case; they included 10 cases of rejection and 11 (JIK), in the absence of any clinical information beyond
cases which were not rejection. They had all been seen the time after transplantation. For each biopsy a ‘score’
q 1999 Blackwell Science Ltd, Histopathology, 35, 461–467.
464 P N Furness et al.

was allocated for each of the features listed in Table 1, in


accordance with the definitions provided therein.
It should be recognized that the 100 ‘training’ cases
were an unselected series, but the other two sets were
specifically chosen because they had caused diagnostic
difficulty.

N E T W O R K C O N S T RU C T I O N A N D T R A I N I N G

The network used was a simple single layer perception


network available from the MATLAB neural network
Toolbox. We take a weighted average of the inputs
(Figure 1). Then for some threshold (b), for the
hypothesis that ‘this is acute rejection’ we have:
if P < b then reject the hypothesis
if P > b then accept the hypothesis.
The equation:
X
n
wk Ik ¼ b Figure 2. An illustration of the n-dimensional hyperplane
K¼1 constructed by the network. The n observations made for each
in Figure 1 is that of a hyperplane in n dimensional biopsy place it in a defined position in n-dimensional space; in this
space where the weights wk, k ¼ 1,. . ., n, orientate the illustration, for clarity, n ¼ 2. The position of the point on one side of
the hyperplane defines whether the network considers this to be a
plane. The network learns by finding weights so that the case of rejection or not. The distance from the hyperplane provides a
known answers are obtained on a set of learning data. measure of the confidence with which the diagnosis has been made.
In this case acceptance and rejection can be distin-
guished by the plane. With n ¼ 2 this amounts to accept It has been suggested that neural networks, especially
and reject being separated by a line (e.g. Figure 2). relatively simple ones such as this, may provide no
Furthermore, the certainty with which the hypothesis is advantage over more conventional statistical analysis.8
accepted or rejected can be estimated by measuring the We therefore analysed the same data using logistic
distance of the data point in question from the regression.9
separating plane.
In order to provide further network training with
more difficult cases, we subsequently repeated the Results
exercise after having added the Oxford cases to the After training with the 100 ‘training’ cases, we tested
training group; we tested this network with the Banff the network with a randomly selected subset of 20 of
cases. We then trained the network with data from the those cases. This did not prove to be a taking exercise; all
initial 100 cases and the Banff cases, and tested the 20 were correctly diagnosed. However, when testing the
result with the Oxford cases. network with the more difficult sets, which had been

Table 2. Train with 100 random cases: test with ‘Banff’ set (see
text)

Clinical diagnosis

Rejection Non-rejection Total

Predicted: Rejection 2 0 2

Non-rejection 9 10 19

Figure 1. A simple diagram of the layout of the neural network. The


Total 11 10 21
central equation is explained in the text.

q 1999 Blackwell Science Ltd, Histopathology, 35, 461–467.


Neural network diagnosis of rejection 465

Table 3. Train with 100 random cases; test with ‘Oxford’ set Table 5. Train with 100 random cases and ‘Oxford’ set; test
(see text) with ‘Banff’ set using logistic regression

Clinical diagnosis Clinical diagnosis

Rejection Non-rejection Total Rejection Non-rejection Total

Predicted: Rejection 0 0 0 Predicted: Rejection 7 8 15

Non-rejection 21 4 25 Non-rejection 4 2 6

Total 21 4 25 Total 11 10 21

selected because the diagnosis or exclusion of rejection also compares very favourably with the results of logistic
had been difficult, the performance of the network was regression, when applied to the same data (Table 5).
very disappointing. With the 21 ‘Banff ’ cases, the If the network was trained with the 100 training
results are shown in Table 2. Only 11 correct diagnoses cases and with the 21 Banff cases, the number of correct
were achieved. diagnoses achieved when testing with the Oxford group
With the Oxford set, results are shown in Table 3. improved to 24/25 (Table 6).
Only 4/25 were correct, which is fewer than might be Again, this compares favourably with the results of
expected:)y chance. With both sets the network’s errors using logistic regression with the same data (Table 7).
were exclusively under-diagnosis of rejection, as one After training with the more difficult cases, the
might expect as a consequence of training with network was using all the histological features except
‘obvious’ cases of rejection, but testing with less severe acute glomerulitis. To our surprise, relatively little
cases. At this stage the network was assigning most weight was assigned to tubulitis, which is the feature
significance to the weight of the interstitial infiltrates. accorded the most prominence in the Banff classifica-
However, when the network had been trained not tion of transplant pathology. This is probably because
only with the 100 training cases but also with the 25 the test cases were selected as having caused diagnostic
‘Oxford’ cases, its success rate with the ‘Banff ’ cases difficulty; by definition, if tubulitis had permitted a clear
improved to 19/21 (Table 4). diagnostic distinction, such diagnostic difficulty would
This compares favourably with the highest number of not have arisen.
correct diagnoses provided by any single UK pathologist
(18/21),6 with the number of correct diagnoses
Discussion
produced when the ‘Banff ’ cases were reviewed in the
laboratory of the main architect of the Banff classifica- Neural networks have been proposed as useful tools in
tion (15/21) and very favourably with the mean decision-making in a variety of medical applications
number of correct diagnoses provided by all the UK for a number of years (reviewed in 10). A relatively
pathologists (64.8% using a ‘conventional’ diagnostic simple form of inference network, the Bayesian belief
approach and 63.3% using the Banff classification).6 It network, has recently received some prominence in the

Table 4. Train with 100 random cases and ‘Oxford’ set; test Table 6. Train with 100 random cases and ‘Banff’ set; test with
with ‘Banff’ set (see text) ‘Oxford’ set (see text)

Clinical diagnosis Clinical diagnosis

Rejection Non-rejection Total Rejection Non-rejection Total

Predicted: Rejection 9 0 9 Predicted: Rejection 21 1 22

Non-rejection 2 10 12 Non-rejection 0 3 3

Total 11 10 21 Total 21 4 25

q 1999 Blackwell Science Ltd, Histopathology, 35, 461–467.


466 P N Furness et al.

Table 7. Train with 100 random cases and ‘Banff’ set; test with histological diagnosis with some external standard, rather
‘Oxford’ set using logistic regression than comparing the histological diagnosis from the net-
work with the histological diagnosis of an expert histo-
Clinical diagnosis pathologist, which could become a self-fulfilling prophesy.
If our agreed retrospective diagnoses are accepted,
Rejection Non-rejection Total then training the network with ‘easy’ cases provided an
apparently good performance when tested with the same,
Predicted: Rejection 16 3 19
‘easy’ cases; but the network was relying heavily on the
weight of the interstitial lymphocytic infiltrates. Lympho-
Non-rejection 5 1 6
cytic infiltration is a prominent feature of acute rejection,
Total 21 4 25 but it is known to be not specific. It is therefore not
surprising that at this stage the network did not provide an
acceptable performance when tested with ‘difficult’ cases,
and gave an under-diagnosis of rejection. It was necessary
pathological literature.11,12 Nevertheless, computer- first to expose the network to data from comparably
based networks or ‘decision support systems’ have not difficult cases before good results were achieved.
found widespread use in routine practice, with the In this study there was no interobserver variation, as
possible exception of automated cervical cytology all the observations were made by one individual (JIK),
screening systems.13,14 Two possible explanations are but the sensitivity of the network to the quality of the
evident. In some applications they produce little or no training data suggests that interobserver variation
improvement beyond conventional ‘expert opinion’ or could be a significant problem. It might be necessary
more conventional statistical approaches such as logistic for every pathologist who wishes to use such a system to
regression.8 In most applications they require careful data train it with their own observations. Differences
collection and input, which is relatively laborious. between institutions and populations may also have a
Our results indicate that the first of these objections major impact. We have some preliminary data which
need not apply in the context of acute allograft suggests that even if the observer does not change, a
rejection. The diagnosis of acute rejection is clinically move from the UK to Pakistan results in a network
important. It is usually based mainly on interpretation becoming unusable until it has been re-trained with
of the biopsy appearances; such interpretation is often local cases (JIK, manuscript in preparation). Although
difficult, even for experienced practitioners.6 We have laborious, this could have considerable advantages, as
shown that it is possible to use observations made by a the output would then be related directly and repro-
trainee histopathologist to address the diagnosis of ducibly to diagnoses made by retrospective clinical
acute rejection in difficult cases, and thus produce a review, and problems of interobserver and interinstitu-
very high proportion of correct diagnoses. When tested tion variation could be almost eliminated.
with the 21 ‘Banff ’ cases, this approach allowed the The second objection, of excessive labour require-
trainee pathologist to perform better than any single UK ments, is more difficult to rebut. Our approach requires
renal transplant pathologist, better than review by staff the systematic, separate evaluation of several different
in the laboratory which initiated the Banff classification histological features from each biopsy, with subsequent
of transplant pathology,5 and much better than the data entry into a computer. The balance of this
average for the UK’s transplant pathologists. The results argument has perhaps been modified by the recent
were also better than could be achieved by applying wide acceptance of the Banff classification of transplant
logistic regression to the same data. pathology, which requires a similar numeric evaluation
It is notable, however, that the quality of the results is of several features ¹ including tubulitis, which in practice
crucially dependent on the quality of the network takes longer to evaluate than any other feature. Other
training. In this context it must be admitted that there is features (such as eosinophil infiltration, lymphocyte
a problem with identifying the ‘gold standard’ diagnosis activation) are under evaluation for incorporation into
against which the results of histological examination the Banff scheme. Some of the features which we used
are judged. In the field of transplant rejection, our (such as venous endothelial changes) can probably be
approach of retrospective clinical review against preset dropped without loss of diagnostic accuracy.
standards has been widely used; indeed, our standards Consequently, the work involved in data acquisition
were more stringent than have sometimes been used. may be little changed by a neural network approach.
Errors in the ‘agreed diagnosis’ cannot be excluded Data entry is a separate problem. At present the
completely, but at least we were able to compare the network is set up using the mathematical data
q 1999 Blackwell Science Ltd, Histopathology, 35, 461–467.
Neural network diagnosis of rejection 467

manipulation program MATLAB, and the interface is References


not sufficiently ‘user friendly’ to be acceptable to
histopathologists; but we hope to be able to produce a 1. Furness PN. Advances in the diagnosis of renal transplant
dedicated program which is much easier to use. rejection. Curr. Diag. Pathol. 1996; 3; 81–90.
We have so far considered only the assessment of 2. Hayry P, von Willebrand E. Practical guidelines for fine needle
aspiration biopsy of human renal allografts. Ann. Clin. Res. 1981;
histological features, but a diagnosis of acute rejection is 13; 288–284.
always made in the context of clinical information. 3. Kyo M, Miliatsch MJ, Gudat F, Dalquen P, Huser B, Thiel G. Renal
Neural networks are adept at integrating different types of graft rejection or cyclosporin toxicity? Early diagnosis by a
observation, so there is no reason why clinical informa- combination of Papanicolaou and immunocytochemical staining
tion should not be incorporated in the same way. For of urinary cytology specimens. Transplant 1992; 5; 71–76.
4. Rush DN, Henry SF, Jeffery JR, Schroeder TJ, Gough J. Histological
example, time since transplantation, rate of change of findings in early routine biopsies of stable renal allograft
serum creatinine, and serum levels of immunosuppressive recipients. Transplantation 1994; 57; 208–211.
drugs could all be incorporated. The results would show 5. Solez K, Axelsen RA, Benediktsson H et al. International
the benefits of reproducibility and objectivity, and would standardization of criteria for the histologic diagnosis of renal
also provide an evaluation of the significance of each allograft rejection: the Banff working classification of kidney
transplant pathology. Kidney Int. 1993; 44; 411–422.
clinical feature as data accumulates. Remaining within 6. Furness P, Kirkpatrick U, Taub N, Davies D, Solez K. A UK-wide
the field of transplant pathology, the evaluation of chronic trial of the Banff classification of renal transplant pathology in
changes would permit a similar approach. Here the routine diagnostic practice. Nephrol. Dial. Transplant. 1997; 12;
importance of developing ‘surrogate markers’ of chronic 995–1000.
rejection has recently been emphasized,15 and the 7. Kazi JI, Furness PN, Nicholson M. Diagnosis of early acute renal
allograft rejection by evaluation of multiple histological
‘Chronic Allograft Damage Index’ (CADI) has been features using a Bayesian belief network. J. Clin. Pathol. 1998;
promoted.16 A neural network approach would provide 51; 108–113.
a more logical way in which the CADI data could be 8. Cross SS, Bury JP, Stephenson TJ, Harrison RF. Image analysis of
integrated, giving a more appropriate ‘weighting’ to the low magnification images of fine needle aspirates of the breast
significance of each feature and avoiding the unjustified produces useful discrimination between benign and malignant
cases. Cytopathology 1997; 8; 265–273.
assumption that the numeric scoring of each feature 9. Altman D. Practical Statistics for Medical Research. London:
represents a linear scale. A network would also allow Chapman & Hall, 1991: 351–358.
clinical features to be incorporated. The result could be an 10. Cross SS, Harrison RF, Kennedy RL. Introduction to neural
approach to predicting graft outcome which is collabora- networks. Lancet 1995; 346; 1075–1079.
tive, rather than the competition which currently exists 11. Montironi R, Whimster VYT, Collan Y, Hamilton PW, Thompson
D, Bartels PH. How to develop and use a Bayesian belief network.
between different approaches. J. Clin. Pathol. 1996; 49; 194–201.
In conclusion, we have demonstrated that neural 12. Hamilton PW, Montironi R, Abmayr W et al. Clinical replications
network technology can improve dramatically the of Bayesian belief networks in pathology. Pathologica 1995; 87;
accuracy of the histological diagnosis of early acute 237–245.
renal allograft rejection. The approach has the potential 13. Ryan MR, Stastny JF, Remmers R, Pedigo MA, Cahill LA,
Frable WL. PAPNET directed rescreening of cervicovaginal
to remove interobserver variation and to integrate smears: a study of 101 cases of atypical squamous cells of
clinical data. The obstacle to translating these improve- undetermined significance. Am. J. Clin. Pathol. 1996; 105;
ments to routine practice is the amount of effort which 711–718.
will be required by histopathologists, mathematicians 14. Kok MR, Boon ME. Consequences of neural network technology
and computer programmers. for cervical screening: increase in diagnostic consistency and
positive scores. Cancer 1996; 78; 112–117.
15. Hunsicker LG, Bennett LE. Design of trials of methods to reduce
Acknowledgements late renal allograft loss: The price of success. Kidney Int. 1995; 48
(Suppl.); S120–S123.
We are grateful to Dr D R Davies and to Dr M Dunnill 16. Isoniemi H, Taskinen E, Hayry P. Histological chronic allograft
for constructive advice on the manuscript and for damage index accurately predicts chronic renal allograft rejec-
permitting access to the Oxford case files. tion. Transplantation 1994; 58; 1195–1198.

q 1999 Blackwell Science Ltd, Histopathology, 35, 461–467.

Вам также может понравиться