Академический Документы
Профессиональный Документы
Культура Документы
Aims: To develop and test a neural network to assist in performance with either of the other sets was poor.
the histological diagnosis of early acute renal allograft However, if either of the ‘difficult’ sets was added to the
rejection. training group, testing with the other ‘difficult’ group
Methods and results: We used three sets of biopsies to improved dramatically; 19 of the 21 ‘Banff ’ study cases
train and test the network: 100 ‘routine’ biopsies from were diagnosed correctly. This was achieved using
Leicester; 21 selected difficult biopsies which had observations made by a trainee pathologist. The result
already been evaluated by most of the renal transplant is better than was achieved by any of the many
pathologists in the UK, in a study of the Banff experienced pathologists who had previously seen
classification of allograft pathology and 25 cases these biopsies (maximum 18/21 correct), and is
which had been classified as ‘borderline’ according to considerably better than that achieved by using logistic
the Banff classification in a review of transplant biopsies regression with the same data.
from Oxford. The correct diagnosis for each biopsy was Conclusion: A neural network can provide a considerable
defined by careful retrospective clinical review. Biopsies improvement in the diagnosis of early acute allograft
where this review did not provide a clear diagnosis were rejection, though further development work will be
excluded. Each biopsy was graded for 12 histological needed before this becomes a routine diagnostic tool.
features and the data was entered into a simple single The selection of cases used to train the network is
layer perception network, designed using the MATLAB crucial to the quality of its performance. There is scope
neural network toolbox. Results were compared with to improve the system further by incorporating clinical
logistic regression using the same data, and with information. Other related areas where this approach is
‘conventional’ histological diagnosis. If the network likely to be of value are discussed.
was trained only with the 100 ‘routine’ cases, its
Keywords: allograft, Banff, biopsy, neural network, rejection
approach remains histological assessment of a needle each histological feature. We therefore sought to
biopsy. develop a simple neural network which might assist
Unfortunately the histological changes of acute the decision-making process in the diagnosis of acute
rejection develop gradually over hours and days, and renal transplant rejection.
in early cases (which are most amenable to treatment)
the diagnosis can be very difficult. Furthermore, some
histological features of acute rejection can be seen in Materials and methods
protocol biopsies from stable grafts.4
The development of the Banff classification of renal H I S T O L O G I C A L F E AT U R E S E X A M I N E D
transplant pathology5 raised hopes that procedures for A standard set of histological features was recorded
the evaluation of renal transplant biopsies would be for each biopsy by a single observer. The list of
harmonized and that the accuracy of diagnosis of acute features is given in Table 1. Where a ‘Banff ’ definition
rejection would be improved. A recent study of the Banff of a feature was available, we used that definition.
scheme showed that the former aim has been achieved, For other features, we devised our own grading
but the latter has not.6 Twenty-one carefully selected system as shown in Table 1, after review of the
‘difficult’ transplant biopsies were circulated around the range of appearances to be seen in the biopsies to be
majority of pathologists who report renal transplant studied.
biopsies in the UK. Using the Banff classification
produced more consistent diagnoses than a conven-
tional approach, but the number of correct diagnoses CLINICAL CASES
(as judged against a retrospective clinical review) was
not improved. In a study of this type it is essential to define how the
We argued that this disappointing result arose correct diagnosis is identified. We used a retrospective
because in the diagnosis of early acute rejection, the review of the clinical casenotes. Rejection was defined as
Banff classification concentrates on just one feature, a rise of serum creatinine of at least 15% in the week
‘tubulitis’. Other features which have been informally preceding biopsy followed either by a fall to within 5% of
considered during a ‘conventional’ evaluation of a baseline within 7 days of treatment, or by progression to
transplant biopsy are ignored. loss of the graft by rejection. Cases where such changes
The human brain is quite good at integrating could have been due to changes in hydration were
disparate piece of information to come to a decision in excluded. Non-rejection was defined as a ‘protocol’
an informal way, but it is not consistent. To add more biopsy with a change of serum creatinine of less than
histological features and ask pathologists to calculate 5% in the week following biopsy with no change in
the probability of rejection in a systematic, reproducible immunosuppression, or a rise in creatinine which was
manner would be impractical. Furthermore, there is clearly identifiable as a consequence of another condi-
little more than ‘expert opinion’ to indicate what weight tion, and which responded to treatment of that
should be attributed to each histological feature. condition without increasing immunosuppression.6
We proposed that a computer-based system could
allow input of more varied data without losing
TRAINING THE NETWORK
reproducibility of assessment. In the context of trans-
plant rejection, this method has the advantage that To train the network, we took a sequential series of
subsequent review of the clinical history can, in most transplant biopsies from the files of the pathology
cases, provide an unequivocal diagnosis of rejection or department at Leicester General Hospital. Biopsies from
not rejection. This is invaluable in training and in grafts which had been in situ for 6 months or more were
testing such a system. We have already shown that this excluded. The details were passed to a member of the
sort of data integration can be achieved using a clinical team (MLN) who excluded cases where a
Bayesian belief network, which is a form of inference subsequent review of the clinical notes did not provide
network.7 This approach has the disadvantage of a clear diagnosis or exclusion of acute rejection. In this
relative inflexibility, as the ‘importance’ attached to way we continued through the files until we had 100
each histological feature has to be calculated and suitable cases; 43 cases of definite rejection and 57 of
programmed into the network at the onset. A neural definite ‘not rejection’. Sections were then withdrawn
network raises the possibility of greater flexibility; the from the files and the severity of all 12 features shown in
process of ‘training’ a neural network would auto- Table 1 were recorded for each biopsy by a single
matically calculate what ‘weight’ should be allocated to observer (JIK).
q 1999 Blackwell Science Ltd, Histopathology, 35, 461–467.
Neural network diagnosis of rejection 463
Interstitial haemorrhage 0–3: absent; 1–25%; 26–50%; more than 50% of area
Eosinophils Number per single high-power field in most heavily infiltrated area
Plasma cells Number per single high-power field in most heavily infiltrated area
Arterial endothelial mononuclear cell adherence Present or absent (cells adherent to luminal surface of endothelium;
c/f intimal arteritis)
Venous endothelial mononuclear cell adherence Present or absent (cells adherent to luminal surface of endothelium;
c/f venulitis)
N E T W O R K C O N S T RU C T I O N A N D T R A I N I N G
Table 2. Train with 100 random cases: test with ‘Banff’ set (see
text)
Clinical diagnosis
Predicted: Rejection 2 0 2
Non-rejection 9 10 19
Table 3. Train with 100 random cases; test with ‘Oxford’ set Table 5. Train with 100 random cases and ‘Oxford’ set; test
(see text) with ‘Banff’ set using logistic regression
Non-rejection 21 4 25 Non-rejection 4 2 6
Total 21 4 25 Total 11 10 21
selected because the diagnosis or exclusion of rejection also compares very favourably with the results of logistic
had been difficult, the performance of the network was regression, when applied to the same data (Table 5).
very disappointing. With the 21 ‘Banff ’ cases, the If the network was trained with the 100 training
results are shown in Table 2. Only 11 correct diagnoses cases and with the 21 Banff cases, the number of correct
were achieved. diagnoses achieved when testing with the Oxford group
With the Oxford set, results are shown in Table 3. improved to 24/25 (Table 6).
Only 4/25 were correct, which is fewer than might be Again, this compares favourably with the results of
expected:)y chance. With both sets the network’s errors using logistic regression with the same data (Table 7).
were exclusively under-diagnosis of rejection, as one After training with the more difficult cases, the
might expect as a consequence of training with network was using all the histological features except
‘obvious’ cases of rejection, but testing with less severe acute glomerulitis. To our surprise, relatively little
cases. At this stage the network was assigning most weight was assigned to tubulitis, which is the feature
significance to the weight of the interstitial infiltrates. accorded the most prominence in the Banff classifica-
However, when the network had been trained not tion of transplant pathology. This is probably because
only with the 100 training cases but also with the 25 the test cases were selected as having caused diagnostic
‘Oxford’ cases, its success rate with the ‘Banff ’ cases difficulty; by definition, if tubulitis had permitted a clear
improved to 19/21 (Table 4). diagnostic distinction, such diagnostic difficulty would
This compares favourably with the highest number of not have arisen.
correct diagnoses provided by any single UK pathologist
(18/21),6 with the number of correct diagnoses
Discussion
produced when the ‘Banff ’ cases were reviewed in the
laboratory of the main architect of the Banff classifica- Neural networks have been proposed as useful tools in
tion (15/21) and very favourably with the mean decision-making in a variety of medical applications
number of correct diagnoses provided by all the UK for a number of years (reviewed in 10). A relatively
pathologists (64.8% using a ‘conventional’ diagnostic simple form of inference network, the Bayesian belief
approach and 63.3% using the Banff classification).6 It network, has recently received some prominence in the
Table 4. Train with 100 random cases and ‘Oxford’ set; test Table 6. Train with 100 random cases and ‘Banff’ set; test with
with ‘Banff’ set (see text) ‘Oxford’ set (see text)
Non-rejection 2 10 12 Non-rejection 0 3 3
Total 11 10 21 Total 21 4 25
Table 7. Train with 100 random cases and ‘Banff’ set; test with histological diagnosis with some external standard, rather
‘Oxford’ set using logistic regression than comparing the histological diagnosis from the net-
work with the histological diagnosis of an expert histo-
Clinical diagnosis pathologist, which could become a self-fulfilling prophesy.
If our agreed retrospective diagnoses are accepted,
Rejection Non-rejection Total then training the network with ‘easy’ cases provided an
apparently good performance when tested with the same,
Predicted: Rejection 16 3 19
‘easy’ cases; but the network was relying heavily on the
weight of the interstitial lymphocytic infiltrates. Lympho-
Non-rejection 5 1 6
cytic infiltration is a prominent feature of acute rejection,
Total 21 4 25 but it is known to be not specific. It is therefore not
surprising that at this stage the network did not provide an
acceptable performance when tested with ‘difficult’ cases,
and gave an under-diagnosis of rejection. It was necessary
pathological literature.11,12 Nevertheless, computer- first to expose the network to data from comparably
based networks or ‘decision support systems’ have not difficult cases before good results were achieved.
found widespread use in routine practice, with the In this study there was no interobserver variation, as
possible exception of automated cervical cytology all the observations were made by one individual (JIK),
screening systems.13,14 Two possible explanations are but the sensitivity of the network to the quality of the
evident. In some applications they produce little or no training data suggests that interobserver variation
improvement beyond conventional ‘expert opinion’ or could be a significant problem. It might be necessary
more conventional statistical approaches such as logistic for every pathologist who wishes to use such a system to
regression.8 In most applications they require careful data train it with their own observations. Differences
collection and input, which is relatively laborious. between institutions and populations may also have a
Our results indicate that the first of these objections major impact. We have some preliminary data which
need not apply in the context of acute allograft suggests that even if the observer does not change, a
rejection. The diagnosis of acute rejection is clinically move from the UK to Pakistan results in a network
important. It is usually based mainly on interpretation becoming unusable until it has been re-trained with
of the biopsy appearances; such interpretation is often local cases (JIK, manuscript in preparation). Although
difficult, even for experienced practitioners.6 We have laborious, this could have considerable advantages, as
shown that it is possible to use observations made by a the output would then be related directly and repro-
trainee histopathologist to address the diagnosis of ducibly to diagnoses made by retrospective clinical
acute rejection in difficult cases, and thus produce a review, and problems of interobserver and interinstitu-
very high proportion of correct diagnoses. When tested tion variation could be almost eliminated.
with the 21 ‘Banff ’ cases, this approach allowed the The second objection, of excessive labour require-
trainee pathologist to perform better than any single UK ments, is more difficult to rebut. Our approach requires
renal transplant pathologist, better than review by staff the systematic, separate evaluation of several different
in the laboratory which initiated the Banff classification histological features from each biopsy, with subsequent
of transplant pathology,5 and much better than the data entry into a computer. The balance of this
average for the UK’s transplant pathologists. The results argument has perhaps been modified by the recent
were also better than could be achieved by applying wide acceptance of the Banff classification of transplant
logistic regression to the same data. pathology, which requires a similar numeric evaluation
It is notable, however, that the quality of the results is of several features ¹ including tubulitis, which in practice
crucially dependent on the quality of the network takes longer to evaluate than any other feature. Other
training. In this context it must be admitted that there is features (such as eosinophil infiltration, lymphocyte
a problem with identifying the ‘gold standard’ diagnosis activation) are under evaluation for incorporation into
against which the results of histological examination the Banff scheme. Some of the features which we used
are judged. In the field of transplant rejection, our (such as venous endothelial changes) can probably be
approach of retrospective clinical review against preset dropped without loss of diagnostic accuracy.
standards has been widely used; indeed, our standards Consequently, the work involved in data acquisition
were more stringent than have sometimes been used. may be little changed by a neural network approach.
Errors in the ‘agreed diagnosis’ cannot be excluded Data entry is a separate problem. At present the
completely, but at least we were able to compare the network is set up using the mathematical data
q 1999 Blackwell Science Ltd, Histopathology, 35, 461–467.
Neural network diagnosis of rejection 467