Вы находитесь на странице: 1из 7

This is an enhanced PDF from The Journal of Bone and Joint Surgery

The PDF of the article you requested follows this cover page.

The Neer classification system for proximal humeral fractures. An


assessment of interobserver reliability and intraobserver reproducibility

ML Sidor, JD Zuckerman, T Lyon, K Koval, F Cuomo and N Schoenberg


J Bone Joint Surg Am. 1993;75:1745-1750.

This information is current as of April 18, 2011

Reprints and Permissions Click here to order reprints or request permission to use material from this
article, or locate the article citation on jbjs.org and click on the [Reprints and
Permissions] link.
Publisher Information The Journal of Bone and Joint Surgery
20 Pickering Street, Needham, MA 02492-3157
www.jbjs.org
Copyright 1993 by The Journal ofBone andfoin: Surgery, Incorporated

The Neer Classification System for Proximal Humeral Fractures


AN ASSESSMENT OF INTEROBSERVER RELIABILITY AND INTRAOBSERVER

BY MICHAEL L. SIDOR, M.D.t, JOSEPH D. ZUCKERMAN, M.D.t, TOM LYON, B.S.t, KENNETh KOVAL, M.D.t,
FRANCES CUOMO, M.D.t, AND NORMAN SCHOENBERG, M.D.t, NEW YORK, N.Y.

Investigation performed at the Shoulder Service, Hospitalfor Joint Diseases Orthopaedic Institute, New York City

ABSTRACT: The radiographs offifty fractures of the For any system for the classification of fractures,
proximal part of the humerus were used to assess the excellent reliability and reproducibility among all re-
interobserver reliability and intraobserver reproduc- viewers in the interpretation of the radiographs and the
ibility of the Neer classification system. A trauma series classification of the injuries are desirable features. Inter-
consisting of scapular anteroposterior, scapular lateral, observer reliability refers to the level of agreement be-
and axillary radiographs was available for each frac- tween different observers for the classification of a
ture. The radiographs were reviewed by an orthopaedic specific fracture. Reproducibility, or intraobserver reli-
shoulder specialist, an orthopaedic traumatologist, a ability, indicates the level of agreement for one observer
skeletal radiologist, and two orthopaedic residents, in for the classification of a specific fracture on separate
their fifth and second years of postgraduate training. occasions. Interobserver reliability and intraobserver
The radiographs were reviewed on two different occa- reproducibility of even the most widely accepted classi-
sions, six months apart. fication systems have been assessed only infrequently.
Interobserver reliability was assessed by compari- Recently, classification systems for fractures of the prox-
son of the fracture classifications determined by the imal part of the femur”6, the ankl&5”7, and the carpal
five observers. Intraobserver reproducibility was eval- scaphoid4 have been reviewed, and, in general the de-
uated by comparison of the classifications determined gree of interobserver reliability has been disappointing.
by each observer on the first and second viewings. Kristiansen et al. assessed the Neer classification system
Kappa (K) reliability coefficients were used. in a similar manner and found the results to be highly
All five observers agreed on the final classification dependent on the level of experience of the observer9.
for 32 and 30 per cent of the fractures on the first and However, they used only anteropostenor and lateral
second viewings, respectively. Paired comparisons be- radiographs rather than the standard trauma series, and
tween the five observers showed a mean reliability they classified the fractures according to a simplified
coefficient of 0.48 (range, 0.43 to 0.58) for the first system. They did not assess reproducibility.
viewing and 0.52 (range, 0.37 to 0.62) for the second The purpose of the current study was to assess the
viewing. The attending physicians obtained a slightly degree of interobserver reliability and intraobserver re-
higher kappa value than the orthopaedic residents producibility of the Neer classification system for prox-
(0.52 compared with 0.48). Reproducibility ranged imal humeral fractures with use of the standard trauma
from 0.83 (the shoulder specialist) to 0.50 (the skeletal series of radiographs.
radiologist), with a mean of 0.66. Simplification of the
Neer classification system, from sixteen categories to Materials and Methods
six more general categories based on fracture type, did The radiographs of fifty proximal humeral fractures
not significantly improve either interobserver reliabil- or fracture-dislocations in adults who had been seen in
ity or intraobserver reproducibility. the emergency room or offices at our institution were
chosen for inclusion in the study. A standard trauma
Fractures of the proximal part of the humerus are series - scapular anteroposterior, scapular lateral, and
most commonly classified with use of the system intro- axillary radiographs - of good quality was available for
duced by Neer in 1970314. This system is based on the each patient. The radiographs were made with use of
presence of displacement of at least one of the four a standardized technique. In nearly all patients, the
anatomical parts of the proximal part of the humerus. proximal humeral fracture was an isolated injury. There-
Decisions regarding treatment are determined mainly fore, the scapular anteroposterior and scapular lateral
by the type of fracture that is present. radiographs were made with the patient standing, with
the arm on the chest (usually in a sling or shoulder-
*No benefits in any form have been received or will be received immobilizer). This allowed the patient to be positioned
from a commercial party related directly or indirectly to the subject oblique to the x-ray beam. The axillary radiograph was
of this article. No funds were received in support of this study.
tShoulder Service, Hospital for Joint Diseases Orthopaedic In- made with the patient supine, with the arm abducted
stitute, 301 East 17th Street, New York, N.Y. 10003. approximately 60 to 70 degrees in the plane of the scap-

VOL. 75.A, NO. 12. DECEMBER 1993 1745


1746 M. L. SIDOR ET AL.

ula but remaining in the same position of rotation as for During the first testing, the observers were given the
the scapular anteroposterior and scapular lateral radio- typed summary of the Neer classification system and
graphs. In most situations, because of the acute nature were allotted a maximum of five minutes to read and
of the injury, positioning of the patient for the axillary review it. They were not allowed to ask questions con-
radiograph was performed by an orthopaedic surgeon. cerning the information contained in the summary. They
If the patient was unable to stand, all radiographs were were then given a data form, which included the dia-
made with the patient supine. gram of the classification system mentioned earlier. A
The acceptability of the quality of the radiographs metric ruler and a goniometer were available for use
(for both projection and clarity) was determined by two during testing. The trauma series (scapular anteropos-
orthopaedic surgeons who did not serve as observers for tenor, scapular lateral, and axillary radiographs) was
this study. They agreed that the fifty fractures included reviewed for each fracture. Decisions about classifica-
most of the different patterns of proximal humeral frac- tion were made on the basis of the entire trauma series,
tures. All identifying data (other than labels indicating with the observers indicating their choices on the dia-
the right or left side) were obscured on the radiographs. gram. The observers were given an unlimited amount
The series was arranged in random order and numbered of time to make their decisions. After a decision had
as Cases 1 through 50. been made, the radiographs for the next fracture were
The radiographs were reviewed by five observers: an presented. The observers were not permitted to ask
orthopaedic surgeon subspecializing in problems of the questions of the proctor during or after review of the
shoulder (a shoulder specialist), an orthopaedic surgeon radiographs.
subspecializing in traumatology (a traumatologist), a ra- The second testing was performed in an identical
diologist specializing in imaging of the musculoskeletal manner, except that the series of radiographs was shown
system (a skeletal radiologist), and two orthopaedic res- in reverse order to inhibit the observers’ recall of the
idents (one in the fifth year of postgraduate training and decisions made during the first testing.
one in the second year). None of the observers were The interobserver reliability was assessed by com-
informed of the other participants in the study. This was parison of the classifications decided on by the five
done to avoid any discussion of the radiographs after the different observers for each of the fifty fractures. The
testing sequence. intraobserver reproducibility was determined by corn-
Each observer was familiar with the Neer classifica- parison of the classifications decided on by each mdi-
tion system and had used it clinically. However, to stand- vidual observer for the first and second testing sessions.
ardize the information available to the observers at the
time of testing, we provided each one with a typed sum-
Statistical Analysis
mary of the classification system. The Neer scheme for Computer-generated kappa statistics (PC-Agree,
the classification of proximal humeral fractures com- version 2.5; McMaster University, Hamilton, Ontario,
prises four segments: the articular segment, the greater Canada) were used to analyze interobserver reliability
tuberosity, the lesser tuberosity, and the humeral shaft. and intraobserver reproducibility. This analysis involves
For a segment to be considered displaced, it must be adjustment of the observed proportion of agreement
displaced by more than 1 .0 centimeter or angulated between or among observers by correction for the pro-
more than 45 degrees. Displaced fractures are classified portion of agreement that could have occurred by
as two, three, or four-part fractures on the basis of the chance. Hence, the adjusted values are almost always
number of displaced segments. There are separate cate- lower than the observed values for the proportion of
gories for fracture-dislocations (anterior or posterior) agreement. The kappa coefficients range from +1.0
and for fractures of the articular surface (so-called ( complete agreement) through 0 (chance agreement) to
impression fracture or head-splitting fracture). During less than 0 (less agreement than expected by chance).
testing, each observer was given a diagram of the frac- The lower boundary of kappa in this study, with use of
ture classification system that has appeared often in the five observers, was -O.25.
literature3 (Fig. 1). Each observer indicated a classifica- We used the guidelines proposed by Landis and
tion for each fracture on the diagram by choosing one Koch for interpretation of these values to categorize the
of the sixteen different possibilities. kappa coefficients. Values of less than 0.00 indicated
The radiographs were reviewed by each observer on poor reliability; 0.00 to 0.20, slight reliability; 0.21 to
two separate occasions, six months apart. The observers 0.40, fair reliability; 0.41 to 0.60, moderate reliability;
were not provided with any feedback after the first test- 0.61 to 0.80, substantial agreement; and 0.81 to 1.00,
ing. The radiographs were not available to any of the excellent or almost perfect agreement. The kappa coef-
observers between the first and second viewings. In ad- ficients for agreement among the two orthopaedic res-
dition, the observers’ classification choices made at the idents were compared with those among the three
first testing were not available during the second testing. attending physicians with use of a Student t test that
During the first review, the observers did not know that incorporated the standard errors of kappa for these two
they would be retested. groups.

THE JOURNAL OF BONE AND JOINT SURGERY


THE NEER CLASSIFICATION SYSTEM FOR PROXIMAL HUMERAL FRACTURES 1747

Displaced Fractures

FIG. 1
Diagram showing the Neer classification system for proximal humeral fractures. (Modified from Neer, C. S., II: Displaced proximal humeral
fractures. Part I. Classification and evaluation. J. Bone and Joint Surg., 52-A: 1079, Sept. 1970.)

We did not attempt to assess accuracy - that is, how We analyzed the results on the basis of a pairwise
close an experimental observation lies to a true value - comparison among the five observers, which produced
because that would have required a known correct clas- ten paired analyses, and according to the over-all agree-
sification for each fracture that was assessed. These data ment among the group. For the first viewing, the in-
were not available because the classification of each terobserver reliability coefficients (kappa) ranged from
fracture is a matter of interpretation by observers. 0.43 to 0.58, with an over-all value of 0.48. For the second
Rather, we assessed the level of agreement among dif- viewing, the kappa values ranged from 0.37 to 0.62, with
ferent observers. It is important to note that agreement an over-all value of 0.52. There were no significant differ-
(reliability and reproducibility) does not necessarily re- ences between the results of the first and second view-
flect accuracy, for the reasons just stated. ings. For the first viewing, the best paired comparison was
between the traumatologist and the second-year ortho-
Results
paedic resident and the worst, between the traumatol-
Interobserver Reliability ogist and the skeletal radiologist. For the second viewing,
All five observers agreed on the classification for 32 the best paired comparison was also between the trau-
per cent of the fractures during the first testing and for matologist and the second-year resident and the worst,
30 per cent during the second testing. At least four ob- between the skeletal radiologist and the fifth-year resi-
servers agreed on the classification for 54 per cent of the dent. The over-all kappa value for all paired comparisons
fractures during the first testing and for 62 per cent was 0.50, which represents moderate interobserver reli-
during the second testing. When we decreased the ex- ability. None of the paired comparisons achieved almost-
tent of agreement to at least three of the five observers, perfect interobserver reliability (i 0.81).
there was 88 per cent agreement for both the first and The effect of the level of expertise of the observers
second testings. (Examples of fractures for which there was also assessed. The three attending physicians (the
was poor agreement are shown in Figures 2 and 3.) shoulder specialist, the traumatologist, and the skeletal

VOL. 75.A, NO. 12. DECEMBER 1993


1748 M. L. SIDOR ET AL.

FIG. 2
Figs. 2 and 3: Radiographs showing fractures for which there was poor interobserver agreement.
Fig. 2: This fracture was classified as an articular surface fracture, a two-part surgical-neck fracture, a two-part fracture of the lesser
tuberosity, and a three-part fracture of the lesser tuberosity.

radiologist) achieved reliability coefficients of 0.47 for relatively low interobserver reliability and intraob-
the first viewing and 0.56 for the second viewing (mean, server reproducibility might have been caused by the
0.52). The reliability coefficients for the orthopaedic res- complexity of the system’7. Therefore, we simplified the
idents were 0.44 for the first viewing and 0.51 for the sixteen possible choices by dividing them into six cate-
second viewing (mean, 0.48). This represents moderate gories: one-part fractures (type 1), two-part fractures
reliability for both the attending physicians and the or- (types 2 through 5), three-part fractures (types 8 and
thopaedic residents. 9), four-part fractures (type 12), fracture-dislocations
(types 6, 7, 10, 1 1, 13, and 14), and fractures of the
Intraobserver Reproducibility (Table I) articular surface (types 15 and 16). The original data
For all five observers, the reliability coefficient was were analyzed again on the basis of this simplified sys-
0.66, which represents a substantial level of reproducibil- tern; the observers were not re-tested but, rather, their
ity. The only observer to achieve an almost-perfect level original classifications were regrouped.
was the shoulder specialist (x = 0.83); all others were in There was no improvement for either interobser-
the moderate to substantial range (K = 0.50to 0.68). ver reliability or intraobserver reproducibility with use
There was no significant difference in reproducibility of this simplified system. Rather, the interobserver
when the values obtained by the attending physicians reliability coefficients (kappa) for the first and second
were compared with those obtained by the orthopaedic viewings decreased slightly: to 0.42 for the first view-
residents. ing and to 0.48 for the second viewing (compared with
0.48 and 0.52 for the first and second viewings, re-
Effect ofSimplification ofthe Classification System
spectively, with use of the sixteen-category classifica-
The Neer classification system includes sixteen pos- tion system). The value for reproducibility remained
sible categories for any fracture. We believed that the the same (0.66) with use of both systems. (All of the

I
I
...j

FIG. 3
This fracture was classified as a two-part fracture of the greater tuberosity, a two-part surgical-neck fracture. a three-part fracture of the
greater tuberosity, and a four-part fracture.

THE JOURNAL OF BONE AND JOINT SURGERY


THE NEER CLASSIFICATION SYSTEM FOR PROXIMAL HUMERAL FRACTURES 1749

TABLE I tiansen Ct a!. reported a low level of interobserver reli-


INTRAOBSERVER REPRODUCIBILITY AFTER Two REVIEWS ability among four observers of varying expertise who
OF THE Fume FRACTURES. Six MONThS APART
evaluated a series of 100 proximal humeral fractures9.
Reproducibility These authors found the level of expertise to be an
Non- important factor in the prediction of interobserver re-
Reviewer Adjusted* Adjustedt
liability. Their study, however, had a few important
Shoulder specialist 0.86 0.83 limitations. First, they condensed the classification into
Orthopaedic traumatologist 0.70 0.64 five groups (one-part, two-part, three-part, and four-
Skeletal radiologist 0.62 0.50 part fractures, and all other fractures and fracture-
Fifth-year orthopaedic resident 0.74 0.68
dislocations), which were somewhat disparate in terms
Second-year orthopaedic resident 0.70 0.63
of fracture type, treatment options, and prognosis. 5cc-
Mean 0.72 0.66
ond, they did not use a complete trauma series but relied
9The proportion of cases that was classified the same on both only on anteroposterior and lateral radiographs. Finally,
viewings. reproducibility was not assessed9.
tReliability coefficient (kappa value).
We used a complete trauma series to evaluate fifty
proximal humeral fractures that were classified on
values are adjusted kappa reliability coefficients.) the basis of the detailed (sixteen-category) system of
Neer. We found a moderate level of interobserver re-
Discussion liability among the five observers. When we analyzed
Systems for the classification of fractures occupy a the results using pairs of observers, the reliability co-
central role in the practice of orthopaedic surgery. They efficient ranged from 0.37 to 0.62. An almost-perfect
constitute a means for the description of fractures and level of reliability (K 0.81) was not obtained for
fracture-dislocations, and they provide important guide- any paired evaluation. It is interesting that the relia-
lines for treatment. Such systems have been used fre- bility coefficients did not improve when we employed
quently in the orthopaedic literature to describe the a simplified (six-category) version of the classification
results of specific treatments. Our ability to compare the system.
results of various treatments depends in large part on Although we used a complete trauma series, which
the assumption that injuries of comparable severity are we thought would provide the maximum amount of
being treated. Currently, fracture-classification systems information that could be obtained from plain radio-
are the most common mechanism for assessment of the graphs, the reliability did not achieve the high levels that
comparability of different series. would be expected for such a widely used and accepted
Therefore, it is important that fracture-classification classification scheme. We attribute this to several fac-
systems be both reliable and reproducible. However, the tors. First, proximal humeral fractures are inherently
commonly used systems have infrequently been evalu- complex injuries with multiple fracture lines, making it
ated for interobserver reliability and intraobserver re- difficult to assess the displacement of one segment in
producibility, and the results of these few evaluations relation to another. Second, although we used only ra-
have been disappointing. Nielsen et a!. found a low in- diographs of good quality, the overlapping of osseous
terobserver reliability and intraobserver reproducibility densities in this anatomical region increases the diffi-
for the Lauge-Hansen classification of fractures of the culty of interpretation. Any compromise of radiographic
ankle and concluded that the system was difficult to technique can be expected to exacerbate this problem.
apply in a reproducible manner9. Thomsen et al. eval- Additional radiographic studies might have provided
uated the Lauge-Hansen and Weber classifications of information that would have increased the level of reli-
ankle fractures and reported similar results. Frandsen et ability. Computerized tomographic scans have been rec-
al. assessed Garden’s classification scheme for fractures ommended for evaluation of the degree of displacement
of the femoral neck and found the interobserver re- of the tuberosities as well as for assessment of head-
liability to be poor. Andersen et al. reported similar splitting fractures, articular impression fractures, and
results in their evaluation of Evans’ classification for chronic fracture-dislocations237’5. However, their use is
intertrochanteric fractures, although the interobserver not currently considered part of the standard evaluation
reliability was somewhat better than that for Garden’s of these fractures. Finally, it is possible that the criteria
system. for displacement (more than one centimeter of displace-
The Neer classification is the most widely used ment or 45 degrees of angulation) are too difficult to
scheme for proximal humeral fractures. It has gained measure accurately on radiographs. None of our ob-
wide clinical acceptance by orthopaedic surgeons and servers used a goniometer or a metric ruler to measure
radiologists and is considered to have important impli- displacement, although all were given the option to do
cations for both treatment options and outcomes562’4’6. so. The level of expertise and experience also can affect
However, to our knowledge, only one report in the or- interobserver reliability9’7, although this did not appear
thopaedic literature has dealt with its reliability. Kris- to be a significant factor when we compared the values

VOL. 75-A, NO. 12, DECEMBER 1993


1750 M. L. SIDOR ET AL.

obtained by the attending physicians with those ob- Important to any fracture classification system is its
tamed by the orthopaedic residents. relationship to the choice of treatment. In this respect,
Intraobserver reproducibility was found to be higher interobserver reliability and intraobserver reproducibil-
than interobserver reliability in the current study, and ity are major considerations. Differences in classifica-
this is consistent with other reports’4’5’7. The level of tion between observers that do not result in different
expertise and experience was a significant factor. Of recommendations for treatment for a particular fracture
the five observers, the shoulder specialist obtained the are considerably less important. Regardless of whether
highest correlation coefficient (K = 0.83). This was also a fracture is classified as two-part by one observer or
the only value in the entire study that was at the almost- three-part by another, a procedure to reduce and fix the
perfect level (K 0.81) of reliability. Intraobserver re- fracture will generally be recommended. However, a
producibility usually exceeds interobserver reliability fracture classified as minimally displaced by one ob-
because it reflects reproducibility independent of agree- server and as three-part by another may be treated dif-
ment. Therefore, incorrect responses that are repeated ferently by the two observers. Thus, not all differences
may show good intraobserver reproducibility but poor in classification are equal with respect to the implica-
interobserver reliability. tions for treatment and outcome.

References
1. Andersen, E.; Jorgensen, L. G.; and Hededam, L. T.: Evans’ classification of trochanteric fractures: an assessment of the interobserver
and intraobserver reliability. Injury, 21: 377-378, 1990.
2. Bigliani, L U. Fractures of the shoulder. Part I: fractures of the proximal humerus. In Fractures in Adults, edited by C. A. Rockwood, Jr.,
D. P. Green, and R. W. Bucholz. Ed. 3, vol. 1, pp. 881-882. Philadelphia, J. B. Lippincott, 1991.
3. Castagno, A. A.; Shuman, W. P.; Kilcoyne, R. F.; Haynor, D. R.; Morris, M. E.; and Matsen, F. A.: Complex fractures of the proximal
humerus: role of CT in treatment. Radiology, 165: 759-762, 1987.
4. Dias, J. J.; Taylor, M.; Thompson, J.; Brenkel, I. J.; and Gregg, P. J.: Radiographic signs of union of scaphoid fractures. An analysis of
inter-observer agreement and reproducibility. J. Bone and Joint Surg., 70-B(2): 299-301, 1988.
5. Fleiss, J. L: Statistical Methodsfor Rates and Proportions. Ed. 2, p. 217. New York, John Wiley and Sons, 1981.
6. Frandsen, P. A.; Andersen, E.; Madsen, F.; and Skjadt, T.: Garden’s classification of femoral neck fractures. An assessment of inter-
observer variation. J. Bone and Joint Surg., 70-B(4): 588-590, 1988.
7. Kilcoyne, R. F4 Shuman, W. P.; Matsen, F. A., III; Morris, M.; and Rockwood, C. A.: The Neer classification of displaced proximal
humeral fractures: spectrum of findings on plain radiographs and CT scans. AJR: Am. J. Roentgenol., 154: 1029-1033, 1990.
8. Kristiansen, B., and Christensen, S. W.: Proximal humeral fractures. Late results in relation to classification and treatment. Acta Orthop.
Scandinavica, 58: 124-127, 1987.
9. Kristiansen, B.; Andersen, U. L.; Olsen, C. A.; and Varmarken, J. E.: The Neer
classification of fractures of the proximal humerus. An
assessment of interobserver variation. Skel. Radiol., 17: 420-422, 1988.
10. Ku.hhnan, J. E.; Fishman, E. K.; Ney, D. R.; and Magid, D.: Complex shoulder trauma: three-dimensional CT imaging. Orthopedics, 11:
1561-1563, 1988.
11. Landis, J. R., and Koch, G. G. The measurement of observer agreement for categorical data. Biometrics, 33: 159-174, 1977.
12. Mills, H. .1., and Home, G.: Fractures of the proximal humerus in adults. J. Trauma, 25: 801-805, 1985.
13. Neer, C. S., II: Displaced proximal humeral fractures. Part I. Classification and evaluation. J. Bone and Joint Surg., 52-A: 1077-1089,
Sept. 1970.
14. Neer, C. S., II: Displaced proximal humeral fractures. Part II. Treatment of three-part and four-part displacement. J. Bone and Joint
Surg., 52-A: 1090-1 103, Sept. 1970.
15. Nielsen, J. 0.; Dons-Jensen, H.; and Sorensen, H. T.: Lauge-Hansen classification of malleolar fractures. An assessment of the reproduc-
ibility in 118 cases. Acta Orthop. Scandinavica, 61: 385-387, 1990.
16. Seemann, W. -R.; Siebler, G.; and Rupp, H. -G.: A new classification of proximal humeral fractures. European J. Radio!., 6: 163-167, 1986.
17. Thomsen, N. 0. B.; Overgaard, S.; Olsen, L. H.; Hansen, H.; and Nielsen, S. T.: Observer variation in the radiographic classification of
ankle fractures. J. Bone and Joint Surg., 73-B(4): 676-678, 1991.

THE JOURNAL OF BONE AND JOINT SURGERY

Вам также может понравиться