Concordance of Breast Imaging Reporting and Data System Assessments and Management Recommendations in Screening Mammography1 2003年1月第226卷第2期 | 39康复网

¹ From the Center for Health Studies, Group Health Cooperative of Puget Sound, 1730 Minor Ave, Suite 1600, Seattle, WA 98101 (S.H.T.); Affiliations for all other authors and author contributions are listed at the end of this article. Received March 20, 2001; revision requested April 23; revision received June 20; accepted August 9. Supported by grants U01CA86076 (W.E.B., L.E.I.), U01CA86082 (P.A.C.), U01CA63736 (M.B.D.), U01CA70013 (B.M.G.), U01CA63740 (K.K., E.A.S., V.L.E.), U0169976 (R.D.R.), U01CA63731 (S.H.T.), CAU01CA70040 (B.C.Y.), and R01CA63146 (N.U.) from the National Cancer Institute. .

	ABSTRACT

Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES

PURPOSE: To examine how frequently Breast Imaging Reportingand Data System (BI-RADS) mammographic screening assessmentswere associated with expected clinical management recommendations.

MATERIALS AND METHODS: Seven Breast Cancer Surveillance Consortiummammography registries recorded screening assessments and recommendationsin 1997 to identify the proportion of women in each BI-RADScategory. The first screening assessment for a woman withoutcancer or a prior mammogram within 9 months was associated withits independently recorded recommendation.

RESULTS: Among 292,795 women, screening assessments included269,022 (91.9%) with a "negative" or "benign finding," and 267,103(99.3%) of these women were recommended for normal intervalfollow-up. Among 11,861 (4.1%) women with screening assessmentsof "probably benign finding," 4,782 (40.3%) were recommendedfor short interval follow-up as expected on the basis of theBI-RADS, but a high proportion (36.9%) were recommended foradditional imaging. Among 1,625 (0.6%) women with "suspiciousabnormality," most were recommended for biopsy (48.7%) or clinicalexamination and/or surgical consult (9.0%), but many were recommendedfor additional imaging (38.7%). Among 243 (0.1%) women withscreening assessments "highly suggestive of malignancy," a majoritywere recommended for biopsy (73.3%) or clinical examinationand/or surgical consult (18.1%) consistent with BI-RADS, butsome were recommended for additional imaging (6.6%).

CONCLUSION: BI-RADS assessments and management recommendationsare consistent for negative and benign assessments, but inconsistencieswere found in assessments and recommendations for mammographicabnormalities.

Index terms: Breast, ACR Reporting and Data System • Breast neoplasms, diagnosis, 00.30 • Breast radiography, quality assurance, 00.11 • Cancer screening, 00.11, 00.30

	INTRODUCTION

Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES

With the diffusion of screening mammography, there has beena growing interest in standardizing terminology to encourageconsistency between assessments and management recommendations,as well as facilitate radiologists’ audits of their interpretiveperformance (1–3). The diffusion occurred as promotionalefforts increased attention to the mortality reductions affordedby screening mammography (4,5). To address concerns about terminology,the American College of Radiology created the Breast ImagingReporting and Data System (BI-RADS), which includes six numberedassessment categories of 0–5 with associated managementrecommendations (Table 1) (2). The BI-RADS committee intendedthat almost all screening assessments would be category 1, 2,or 0 (Sickles EA, oral communication, 2001). At the beginningof a National Cancer Institute–sponsored effort to developa national consortium of mammography registries (the BreastCancer Surveillance Consortium ) (6), radiologists atconsortium facilities recorded BI-RADS assessments independentlyof recommendations. This recording occurred in accordance withcustomary practices of the time and provides a baseline forBI-RADS use because it occurred before BI-RADS usage was requiredin April 1999 by the Mammography Quality Standards Act (7).

fig.ommitted

TABLE 1. American College of Radiology BI-RADS Categories

The relationship between the assessment and management recommendationhas implications for clinical care, teaching, and evaluatingthe screening interpretations of radiologists. When radiologistsprovide inconsistent recommendations for given assessments,it may create confusion among referring clinicians about whetherand how to conduct further evaluations. Valid computations ofperformance for medical audits are predicated on a consistentclinical meaning for each assessment (8). The assessment isused to classify the interpretation as "positive" or "negative"for the measurement of sensitivity, specificity, and positivepredictive value (7,8). If a specific assessment category impliesadditional evaluation for one radiologist and biopsy for another,then they are using the BI-RADS differently and continuing theconfusion the system is intended to reduce. Although the textaccompanying an interpretation may clarify some inconsistencies,the BI-RADS categoric classification system was intended toavoid the ambiguity of free text. If the categories are usedin ways other than those intended, there may be opportunitiesto either change the terminology or address teaching to theareas of inconsistent use.

To our knowledge, no one has evaluated the use of the AmericanCollege of Radiology BI-RADS assessment and management recommendationsacross multiple facilities and geographic locations in routineclinical practice. Important work has been done to evaluatethe positive predictive value of BI-RADS assessments (9,10)and between-reader variation in the findings descriptions andmanagement recommendations (11). However, these studies usedunique populations or specifically drawn samples. The purposeof this study was to examine how frequently BI-RADS screeningassessments were associated with the expected clinical managementrecommendation in diverse routine practice settings.

	MATERIALS AND METHODS

Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES

Setting
The mammography registries of the Breast Cancer SurveillanceConsortium record assessments from multiple radiology facilitieswithin their respective states of Colorado, North Carolina,New Hampshire, New Mexico, California (San Francisco), Washington(2 registries), and Vermont (6). Data from the mammography registriesare sent to a central data repository at the BCSC StatisticalCoordinating Center. All activities of the Breast Cancer SurveillanceConsortium are reviewed and approved by the institutional reviewboards at each site. Individual informed consent was not requiredbecause of strict maintenance of anonymity and the observationalnature of the study (12).

Subjects
This study includes mammographic information for women 40 yearsof age and older without a personal history of breast cancerwho underwent a screening examination between January 1 andDecember 31, 1997, at any of the 158 participating mammographyfacilities within the seven BCSC registries. Throughout themanuscript, "facility" refers to the physical location wheremammographic imaging is performed. A mammography registry refersto the sum of all the included facilities (range, 2–49facilities per registry). The number of radiologists interpretingmammograms within a registry ranges from 10 to 165. The numberof screening examinations per registry ranges from 11,563 to80,911.

This report is restricted to 292,795 screening examinationsidentified as the first occurrence for a woman in the studyperiod after the following exclusions: women with a prior historyof breast cancer, women with additional imaging during the samevisit, if the examination had no recorded assessment, and ifthe examination had a diagnostic indication. To further ensurethat we included only screening mammograms, we also excludedany examination in 1997 that was preceded by a mammogram within9 months.

Measurements
This study examined BI-RADS use by measuring the following:(a) the proportion of women assigned BI-RADS assessment categories0–5 across the BCSC registries, (b) the proportion ofwomen assigned specific recommendations, and (c) the proportionof assessments having the expected recommendation accordingto the BI-RADS.

We (S.H.T., L.E.I.) included the first recorded "routine screening"examination during the study for women in the registry and referredto it as the "index" examination. Demographic information (age,race, ethnicity, education), prior mammography experience, andassessments are collected in the course of usual care and pooledin a registry before being transferred periodically to the BCSCStatistical Coordinating Center. Encrypted unique identifiersfor women, facilities, and radiologists maintain the anonymityof all data (12). The unique encrypted code is used to linkrecords in the Statistical Coordinating Center over time (12).

Recording systems varied across the diverse facilities, butall included paper or automated summaries with an assessmentand a management recommendation. Each assessment and managementrecommendation was recorded separately by the radiologist foreach mammographic examination by using terminology consistentwith the BI-RADS (described more fully later). The detailedtext that might explain a recommendation could not be systematicallyrecorded and summarized across facilities and was thereforenot available for analysis. Examinations in which recordingsoftware created an automatic link between an assessment andmanagement recommendation were excluded. This exclusion removedone registry (New Mexico) and several facilities that togetheraccounted for 38,526 (11%) of the 350,632 women otherwise eligiblefor the study.

Assessments by radiologists at the study facilities correspondedwith the six BI-RADS categories (Table 1) (3). These categoriesinclude one incomplete (category 0) and five final assessments(categories 1–5). Some facilities recorded two assessmentsat each examination, one for each breast, while others recordeda single assessment for each woman.

Recommendations for clinical follow-up were collected on thesame form but recorded separately from the assessment. A singlerecommendation was given for 290,804 (99.3%) of all 292,795assessments. For the 1,991 (0.7%) assessments with multiplerecommendations, we included in this analysis the highest numberedrecommendation among the following five ordered recommendations:(a) normal interval follow-up, (b) short interval follow-up(ie, <1 year), (c) needs additional imaging, (d) clinicalexamination or surgical consult, and (e) biopsy or fine-needleaspiration. The order reflects an increasing likelihood of malignancy.The data do not allow the separation of clinical examinationand surgical consult recommendations. Women (14,814 [4.8%] of308,003) with no recorded recommendation were categorized as"unknown" and not included in the tables or analysis. Some women(394 [0.1%] of 308,003) were given recommendations other thanthose listed earlier, which were recorded as "other" but notincluded in the tables. Recording of additional imaging didnot include examinations to correct for technical deficiencies.

Because some registries recorded assessments and managementrecommendations for each breast, we established a systematicmethod for assigning the assessment and management recommendationfor a woman. We (S.H.T., L.E.I.) rank ordered the assessmentsbased on potential cancer risk ("negative," "benign finding,""probably benign finding," "need additional imaging evaluation,""suspicious abnormality," "highly suggestive of malignancy").When assessments were not the same for both breasts we (S.H.T.,L.E.I.) assigned the woman the higher assessment. The recommendationsfor the breast were ordered from 1 (low) to 5 (high) as shown.When examining the relationship between assessments and recommendations,the breast with the higher assessment was included for eachwoman and associated with the recommendation for the same breast.

Each case was classified as having received a recommendationconsistent with the BI-RADS assessment if one of the followingconditions (a–d) was met based on the third edition ofthe BI-RADS: (a) assessment of "need additional imaging evaluation"(category 0) with additional views or ultrasonography recommended,(b) assessment of "negative" or "benign finding" (category 1or 2) with a normal interval follow-up recommended, (c) assessmentof "probably benign finding" (category 3) with a recommendationfor short interval follow-up, and (d) assessment of "suspiciousabnormality" or "highly suggestive of malignancy" (categories4 or 5) with a recommendation for clinical examination or surgicalconsult and biopsy or fine-needle aspiration.

Cancer Rates
Because preliminary findings suggested some potential discordbetween how assessments of abnormalities were associated withrecommendations, we also calculated cancer rates and/or examinationfor categories 0, 3, 4, and 5. BCSC registry data include cancersidentified through local Surveillance Epidemiology and End Resultsregistries, state cancer registries, and pathology databases.A file of uniquely encrypted records of women with cancer canthen be linked to the file of assessments for those women. We(S.H.T., L.E.I.) linked the index examinations to cancer occurrencewithin the next 365 days for the subset of registries wherethe data was available (Colorado, North Carolina, New Hampshire,Washington [one registry], and Vermont).

Data Analysis
We (S.H.T., L.E.I.) examined the data in the following threeways: (a) first assessment by mammography registry, (b) firstrecommendation by mammography registry, and (c) first assessmentby recommendation for all registries combined. We then evaluatedthe proportion of assessments having recommendations consistentwith the BI-RADS by age categories (40–49 years, 50–59years, 60–64 years, and 65 years or older) and prior mammographyexperience (<1 year, 1–2 years, 3–4 years, >5years, and never) for each assessment category by using a ²statistic to compare proportions. A P value of less than .05was considered to indicate a statistically significant difference.

	RESULTS

Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES

The Breast Cancer Surveillance Consortium population reportedhere includes 292,795 women with the following ages and mammographyexperience: (a) age; 32% (n = 93,389) 40–49 years, 41%(n = 119,487) 50–64 years, and 27% (n = 79,919) 65 yearsor older; and (b) mammography experience; 94% (n = 273,952)report ever having had a mammogram. Among those for whom wehad the relevant data, women had the following characteristics:(a) race (n = 237,815), 90% (n = 213,278) Caucasian; 6% (n =14,652) African-American; 0.5% (n = 1,145) Native American orEskimo; 2% (n = 5,214) Asian or Pacific Islander; and 1.5% (n= 3,526) Other; (b) Hispanic ethnicity, 3.5% (7,404 of 213,069);(c) education (n = 220,562), 9% (n = 20,387) less than highschool; 29% (n = 63,470) high school graduate; 28% (n = 61,591)some college; 34% (n = 75,114) college or postgraduate; and(d) mammography experience, 70% (134,842 of 191,842) had a mammogramwithin the previous 2 years.

Table 2 shows the first assessments for the index examinations(n = 292,795) for 1997. A majority (n = 269,022; 91.88%) ofwomen had "negative" or "benign finding" assessments (category1 or 2, respectively), but registries differed in the proportionof each. Across all registries combined, the "negative" assessmentwas used much more commonly (78.89%) than the "benign finding"assessment (12.99%). Not shown in the Table is that cancer rateswere 10 in 1,000 examinations for category 3, 34 in 1,000 forcategory 0, 167 in 1,000 for category 4, and 684 in 1,000 forcategory 5.

fig.ommitted

TABLE 2. First Mammography Assessment for Index Screening Examinations for 1997

Table 3 shows the recommendations among all women with an assessment(n = 292,795). As noted earlier, the majority (n = 290,804;99.3%) had a single recommendation, but some (n = 1,991; 0.76%)had two or more. In cases where two recommendations were given,one is listed in the table based on the ordinal scale describedin the methods mentioned. Normal interval follow-up is the mostcommon recommendation (92.06%). Additional imaging was the nextmost common recommendation (5.13%). Short interval follow-up(2.05%) was less common. Biopsy or fine-needle aspiration (0.4%)and clinical examination or surgical consult (0.36%) were theleast common recommendations resulting from the screening mammogram.

fig.ommitted

TABLE 3. Frequency of Recommendations for 1997

Table 4 shows the recommendations by assessments. The percentagesreflect the proportion receiving a recommendation within anassessment category. The assessment and recommendation wereconsistent with the BI-RADS for categories 1, 2, and 0. Forexample, "incomplete assessments" (category 0) were almost always(95.5%) associated with a recommendation for additional imaging.Similarly, 99.62% of the women with "negative" assessments and97.27% of the women with a "benign finding" were recommendedfor normal interval follow-up. A small proportion (28 [0.01%]of 292,795) of the examinations had the highest assessment andhighest management recommendation in opposite breasts.

fig.ommitted

TABLE 4. Recommendations according to Mammography Assessment for 1997

Assessments and recommendations were less consistent with theBI-RADS among women with assessments categories 3, 4, and 5.Among women with a "probably benign finding," a moderate proportion(40.32%) was recommended for "short interval follow-up," but"additional imaging" also was common (36.88%), and many women(18.77%) were recommended for "normal interval follow-up." Additionalimaging was recommended for a relatively high proportion ofwomen (38.71%) with an assessment of "suspicious abnormality."Even among women with an assessment "highly suggestive of malignancy,"6.58% were recommended for additional imaging.

The proportion of management recommendations consistent withassessments differed across age categories for "probably benignfinding" (category 3) (P < .001) and "suspicious abnormality"(category 4) (P < .001). Among women ages 40–49, 50–59,60–64, and 65 years or older, the proportion recommendedfor short interval follow-up was 38.9% (1,615 of 4,155), 39.2%(1,396 of 3,559), 33.6% (386 of 1,147), and 43.3% (1,299 of3,000), respectively. Among women ages 40–49, 50–59,60–64, and 65 years of age or older in category 4, theproportions recommended for biopsy, fine-needle aspiration,or clinical examination or surgical consult were 52.3% (262of 501), 56.1% (249 of 444), 54.2% (96 of 177), and 65.6% (330of 503), respectively. Among women in category 4, a higher proportionwere recommended for additional imaging rather than biopsy,fine-needle aspiration, or clinical examination or surgicalconsult among women ages 40–49 years (220 [43.9%] of 501)compared with women ages 65 years and older (160 [31.8%] of503).

The proportion of consistent recommendations differed for womenwith probably benign assessments whose last mammograms weremore recent (P < .001). For women with a mammogram obtainedwithin less than 1 year previously, 1–2 years, 3–4years, more than 5 years, and never, the proportion recommendedfor short interval follow-up was 40.7% (374 of 919), 33.7% (1,369of 4,063), 41.1% (546 of 1,328), 48.9% (387 of 791), and 53.7%(780 of 1,453), respectively, among women with a "probably benignfinding" assessment. Also among women with a "probably benignfinding" assessment, 28% (257 of 919) of those whose most recentprior mammogram occurred within less than 1 year were recommendedfor normal interval follow-up compared with 21.0% (852 of 4,063)of those whose most recent mammogram occurred 1–2 yearspreviously. Additional imaging was recommended 26.7% (245 of919) and 41.6% (1,688 of 4,063) of the time among women whosemost recent prior mammogram occurred less than 1 year and 1–2years earlier, respectively.

	DISCUSSION

Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES

In this study, we examine the use of the BI-RADS assessmentsand management recommendations for the largest population ofU.S. women reported to date to our knowledge. The populationin this study is demographically similar to the random sampleof U.S. women in the 1997 Behavioral Risk Factor SurveillanceSurvey (13) except for a smaller proportion of women in theoldest and youngest age categories. We found that most "negative"or "benign finding" assessments are associated with a recommendationthat is consistent with the BI-RADS, and these screening assessmentcategories account for 91.9% (269,022 of 292,795) of all women.This consistency reflects a strong contribution of the BI-RADSto achieving the intentions of the creators of the reportingsystem. However, there is some inconsistency with the BI-RADSfor assessments and management recommendations among the 8.1%(n = 23,773) of women with a breast abnormality. While someof the inconsistency may be addressed by teaching, clarificationsin documentation, and changes in the reporting system, the inconsistencymay also reflect clinical practice that needs further exploration.For example, the BI-RADS categories 3 and 4 show inconsistencythat differs by patient age and when the previous mammogramoccurred. This systematic inconsistency suggests that theremay be something about the clinical practice of screening mammographythat is not currently accommodated by the BI-RADS.

Free text might explain some apparent inconsistencies betweenassessments and recommendations in our data, although the textwas not available for analysis. Rarely (0.11%, category 1; 0.42%,category 2; 3.13%, category 3), women with a negative, benign,or probably benign assessment were referred for clinical and/orsurgical evaluation. While an incorrect use of the BI-RADS mightexplain the discrepancy, it is also possible that the discrepancyis due to the presence of a clinical finding. Even though theexamination was intended for screening, findings occur, anddepending on that finding the radiologist may recommend surgicalevaluation or reexamination by the referring clinician. A limitationin these data are that we cannot separate a recommendation forclinical evaluation from one for a surgical evaluation. Thesurgical recommendation is inconsistent with the BI-RADS, butthe text might show that it was clinically indicated if a physicalfinding was present. Automated mammography reporting systemsmight include standard text to address this situation, in whichclinical findings are driving the recommendation.

Inconsistencies are much more common among the 8.1% of examinationswith a breast abnormality on the image, particularly among examinationswhere there is a "probably benign" or "suspicious" abnormality.When assessments were recorded for each breast, a small proportionof the time (0.01% of all assessments), the most serious recommendationoccurred in the breast that was contralateral to the most seriousassessment. While it is a small proportion, it points out theneed to confirm the laterality of any lesion before subsequentevaluation or biopsy.

Assessments and management recommendations for "probably benign"lesions are not completely consistent with the BI-RADS. Useof the assessment has been shown to vary with facility and age(14), and some controversy exists about whether it should everbe used until after full problem-solving imaging (15,16). Despitethe interpretive challenges, the group of lesions appears tobe highly predictive of a benign lesion in some centers, andassociated with a cancer rate of 1% in our data (9,17). Thissuggests that the assessment category is being used correctly,since the expected rate of cancer for these lesions is lessthan 2% (18).

Since the cancer rate among the "probably benign" lesions islow, the problem with the implementation of the BI-RADS category3 appears to be with the associated management recommendation.While a relatively high proportion of women with "probably benignfinding" assessments received the expected short interval follow-uprecommendation (4,782 [40.3%] of 11,861), a high proportionwere recommended for additional imaging (n = 4,374; 36.9%),and some were recommended for "normal interval follow-up" (n= 2,227; 18.8%). We hypothesize that the high proportion recommendedfor additional imaging may reflect interest in conveying a levelof concern to a referring clinician. The radiologist may wantclarification of an ambiguous lesion that is judged likely tobe cystic and therefore order additional imaging with ultrasonography.However, the radiologist may want to be clear that the riskof cancer is low so that the referring physician’s discussionconveys the appropriate level of breast cancer risk to the woman.The substantial proportion of women with annual follow-up maybe due to a suggested protocol of following the abnormalityannually after two 6-month follow-up mammograms (17). Our resultthat women with a "probably benign finding" were more likelyto have a recommendation for annual mammography if they hada mammogram in the previous year supports this interpretation,but consideration should also be given to other explanations.For example, a recent prior mammogram was associated with lessuse of additional evaluation, suggesting that the comparisonfilm provided a reassuring reference. This same reassurancemight explain why annual follow-up, rather than 6-month follow-up,was recommended. However, more research into radiologist’sthinking in these circumstances is needed before we can providean adequate explanation.

Improvements in the BI-RADS use might occur through some modificationsin documenting instructions and teaching, although our resultsshow that the current terminology establishes consistent recommendationsfor a majority of women. For example, the BI-RADS instructionsmight further improve consistency by specifying how to categorize"probably benign findings" that appear stable after 6- and 12-monthfollow-up assessments. Consideration could also be given tostating that such follow-up studies be classified as diagnostic.Teaching about the intended use of the BI-RADS category 0 couldimprove consistency if it meant that the category was used morefrequently. The BI-RADS committee intended that additional imagingoccur after category 0 assessments (Sickles EA, oral communication,2001). Screening interpretations would therefore include categories1, 2, and 0. Any additional imaging would be done after a category0 interpretation, and then a category 1–5 would be assigned.It is expected that less than 10% of screening examinationswould require recall for additional imaging (8). We report 5.1%of women being recommended for additional imaging, but theyare distributed among several assessments, such that category0 accounts for only 3.43% of all interpretations. Teaching mightreduce recommendations for additional imaging after assessmentsother than "category 0, need additional imaging."

Changes in terminology could also be considered to further improveconsistency. For example, we found that categories 4 and 5 assessmentscommonly include a proportion of women recommended for additionalimaging. As noted earlier, this is not the intended use of theBI-RADS and may reflect a misunderstanding of how to use thecategories. However, it may also be that the radiologist wantsthe evaluation of associated abnormalities to occur before abiopsy of an obvious lesion. The referring physician is thereforeprovided with a recommendation for additional imaging in orderfor an evaluation to be done in the appropriate order and toconvey the radiologist’s level of concern. Because wecommonly found additional imaging recommended among category3, 4, and 5 assessments, some consideration could be given tomodifying category 0 in order for the level of concern to bestated clearly, such as "incomplete assessment of a probablybenign finding" and "incomplete assessment of a suspicious finding."In the former group, the imaging would be done primarily toclarify an ambiguity. In the latter group, the additional imagingwould be done to clarify associated lesions and establish theextent of disease. Whether such a change would lead to moreconfusion or increased consistency needs further evaluation.It seems unlikely that teaching alone will eliminate the problem,if the current inconsistent use addresses a need to convey theradiologist’s level of concern. Now that the MammographyQuality Standards Act is in place, and more teaching has occurredabout the BI-RADS, it will be important to see whether the inconsistencyhas been reduced.

One might also conclude that the BI-RADS should be implementedwith only the assessment and recommendation linked by the recordingsoftware. However, implementation of recording software is stillunderway throughout the U.S., and many radiologists continueto dictate the assessments and recommendations. Education mayincrease consistency in the use of the BI-RADS terminology,but more needs to be done to evaluate the clinical concernsthat may be limiting radiologists’ association of assessmentsand recommendations in practice. Such work would be facilitatedby software that allowed automated coding to indicate that arecommendation is based on a clinical finding when there isno mammographic finding, since the BI-RADS is intended to addressonly mammographic findings. However, radiologists must guidethe care of people, and so recommendations may be based on morethan mammographic findings. Some facilities within our registriesand some recording software have already implemented wordingto explain management recommendations based on clinical knowledge.Such software might also identify inconsistencies between assessmentsand recommendations that might avoid the rare event (0.01%)noted in our data, in which the highest assessment and recommendationappeared to be in opposite breasts.

A limitation of this work is that these data were from 1997,before the Mammography Quality Standards Act required the recordingof the BI-RADS assessments. However, The American College ofRadiologists first published its BI-RADS in 1993, and radiologistsin these registries had been using the terminology since atleast 1996 and, in many cases, well before that time (1,3).The Food and Drug Administration requirement for using the BI-RADSassessments was implemented in April of 1999 and does not requiretheir linkage to recommendations. It is unlikely that the requirementitself will therefore change the relationship between the assessmentsand the recommendations, but future research should evaluatethis possibility.

Another limitation of this study is that it is not possibleto assess how the recording systems used might have limitedthe association of interpretations and recommendations. Theability to independently record assessments and recommendationswas a feature of recording sheets and software that was requestedby radiologists at the start of this study. Registry sites usedslightly different approaches but most simply had paper systemswhere the assessment and recommendation could each be indicated.Some facilities continued to dictate in addition to recordingthe findings on a paper form. While the structure of the formmay have allowed the dissociation between the BI-RADS assessmentsand recommendations, it is impossible to assess the magnitudeof the effect on this dissociation of differences in the recordingsystems themselves. It seems unlikely that the forms themselvesdid anything more than allow radiologists to record assessmentsand recommendations in accordance with their practice.

In conclusion, we provide evidence of how well terminology consistentwith the BI-RADS was implemented in practice by 1997. WhileBI-RADS terminology was used consistently for the 91.9% of womenwith a "negative" or "benign finding" assessment, there is roomfor improvement in its use among the 8.1% of women who havea mammographic finding that needs additional attention.

The apparent inconsistency in use may also be important forplanning educational efforts and for future research. Althoughthe radiologists in this study all used the BI-RADS, that practicewas not a requirement until April of 1999 (Mammography QualityStandards Act). In accordance with the requirement, the AmericanCollege of Radiology has paid a great deal of attention to educationregarding the BI-RADS terminology, including publishing thethird edition of the BI-RADS manual, mailing of the manual toall American College of Radiology–accredited facilities,developing a continuing medical education course on the useof the BI-RADS, and creating a CD-ROM–based mammographyinterpretive skills self-assessment examination that emphasizesproper use of BI-RADS terminology (Sickles EA, oral communication,2001) (3). Future teaching might address the inconsistenciesreported here by stressing the appropriate use of "need additionalimaging evaluation" and the distinction between "incomplete"and "final" assessments for screening interpretations, whichis currently well documented in the BI-RADS manual. Future workshould evaluate the effect of educational efforts, whether theinconsistencies reported here appear in other settings, whetherthe inconsistencies persist over time, and whether modificationsin the reporting system are needed to address clinical realities.

	REFERENCES

Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES

Kopans DB, D’Orsi CJ, Adler DD, et al. Breast Imaging Reporting and Data System Reston, Va: American College of Radiology, 1993.
D’Orsi CJ, DeBor M. Reporting and communication. In: Bassett LW, Jackson VP, Jahan R, Fu YS, Gold RH, eds. Diagnosis of disease of the breast. Philadelphia, Pa: Saunders, 1997; 103-125.
D’Orsi CJ, Bassett LW, Feig S, et al. American College of Radiology (ACR): Breast Imaging Reporting and Data System (BI-RADS) 3rd ed. Reston, Va: American College of Radiology, 1998.
Tabar L, Vitak B, Chen HH, et al. The Swedish Two-County Trial twenty years later: updated mortality results and new insights from long-term follow-up. Radiol Clin North Am 2000; 38:625-651.
Kerlikowske K, Grady D, Rubin SM, Sandrock C, Ernster VL. Efficacy of screening mammography: a meta-analysis. JAMA 1995; 273:149-154.
Ballard-Barbash R, Taplin SH, Yankaskas BC, et al. Breast Cancer Surveillance Consortium: a national mammography screening and outcomes database. AJR Am J Roentgenol 1997; 169:1001-1008.
Food and Drug Administration. Quality Mammography Standards. Final Rule-21 CFR Parts 16 and 900 [Docket No. 95N-0192] RIN 0910-AA24 edition Washington, DC: Department of Health and Human Services, 1997 Oct 28.
Linver MN, Osuch JR, Brenner RJ, Smith RA. The mammography audit: a primer for the mammography quality standards act (MQSA). AJR Am J Roentgenol 1995; 165:19-25.
Orel SG, Kay N, Reynolds C, Sullivan DC. BI-RADS categorization as a predictor of malignancy. Radiology 1999; 211:845-850.
Liberman L, Feng TL, Dershaw DD, Morris EA, Abramson AF. US-guided core breast biopsy: use and cost effectiveness. Radiology 1998; 208:717-723.
Berg WA, Campassi C, Langenberg P, Sexton MJ. Breast imaging reporting and data system: inter- and intraobserver variability in feature analysis and final assessment. AJR Am J Roentgenol 2000; 174:1769-1777.
Carney PA, Geller BM, Moffett H, et al. Current medico-legal issues and confidentiality issues in a large multi-center research program: the National Cancer Institute’s Breast Cancer Surveillance Consortium. Am J Epidemiol 2000; 15:371-378.
National Center for Chronic Disease Prevention and Health Promotion. Behavioral risk factor surveillance system 1984-1995 survey data. Behavioral Surveillance Branch CD-ROM Washington, DC: Centers for Disease Control and Prevention, U.S. Department of Health and Human Services,; Series 1, No. 1.
Caplan LS, Blackman D, Nadel M, Monticciolo DL. Coding mammograms using the classification "probably benign finding short interval follow-up suggested. AJR Am J Roentgenol 1999; 172:339-342.
Hall FM. Probably benign breast nodules: follow-up of selected cases without initial full-problem solving imaging (editorial). Radiology 1995; 194:305.
Sickles EA. Probably benign breast nodules: follow-up of all cases requires initial full problem-solving imaging. Radiology 1995; 194:305-306.
Sickles EA. Probably benign breast lesions: when should follow-up be recommended and what is the optimal follow-up protocol?. Radiology 1999; 213:11-14.
Sickles EA. Management of probably benign breast lesions. Radiol Clin North Am 1995; 33:1123-1130.

作者： Stephen H. Taplin MD MPH Laura E. Ichikawa MS 2007-5-12