点击显示 收起
1 From the Center for Health Studies, Group Health Cooperative of Puget Sound, 1730 Minor Ave, Suite 1600, Seattle, WA 98101 (S.H.T.); Affiliations for all other authors and author contributions are listed at the end of this article. Received March 20, 2001; revision requested April 23; revision received June 20; accepted August 9. Supported by grants U01CA86076 (W.E.B., L.E.I.), U01CA86082 (P.A.C.), U01CA63736 (M.B.D.), U01CA70013 (B.M.G.), U01CA63740 (K.K., E.A.S., V.L.E.), U0169976 (R.D.R.), U01CA63731 (S.H.T.), CAU01CA70040 (B.C.Y.), and R01CA63146 (N.U.) from the National Cancer Institute. .
ABSTRACT |
---|
Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES |
---|
MATERIALS AND METHODS: Seven Breast Cancer Surveillance Consortium mammography registries recorded screening assessments and recommendations in 1997 to identify the proportion of women in each BI-RADS category. The first screening assessment for a woman without cancer or a prior mammogram within 9 months was associated with its independently recorded recommendation.
RESULTS: Among 292,795 women, screening assessments included 269,022 (91.9%) with a "negative" or "benign finding," and 267,103 (99.3%) of these women were recommended for normal interval follow-up. Among 11,861 (4.1%) women with screening assessments of "probably benign finding," 4,782 (40.3%) were recommended for short interval follow-up as expected on the basis of the BI-RADS, but a high proportion (36.9%) were recommended for additional imaging. Among 1,625 (0.6%) women with "suspicious abnormality," most were recommended for biopsy (48.7%) or clinical examination and/or surgical consult (9.0%), but many were recommended for additional imaging (38.7%). Among 243 (0.1%) women with screening assessments "highly suggestive of malignancy," a majority were recommended for biopsy (73.3%) or clinical examination and/or surgical consult (18.1%) consistent with BI-RADS, but some were recommended for additional imaging (6.6%).
CONCLUSION: BI-RADS assessments and management recommendations are consistent for negative and benign assessments, but inconsistencies were found in assessments and recommendations for mammographic abnormalities.
Index terms: Breast, ACR Reporting and Data System • Breast neoplasms, diagnosis, 00.30 • Breast radiography, quality assurance, 00.11 • Cancer screening, 00.11, 00.30
INTRODUCTION |
---|
Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES |
---|
|
To our knowledge, no one has evaluated the use of the American College of Radiology BI-RADS assessment and management recommendations across multiple facilities and geographic locations in routine clinical practice. Important work has been done to evaluate the positive predictive value of BI-RADS assessments (9,10) and between-reader variation in the findings descriptions and management recommendations (11). However, these studies used unique populations or specifically drawn samples. The purpose of this study was to examine how frequently BI-RADS screening assessments were associated with the expected clinical management recommendation in diverse routine practice settings.
MATERIALS AND METHODS |
---|
Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES |
---|
Subjects
This study includes mammographic information for women 40 years of age and older without a personal history of breast cancer who underwent a screening examination between January 1 and December 31, 1997, at any of the 158 participating mammography facilities within the seven BCSC registries. Throughout the manuscript, "facility" refers to the physical location where mammographic imaging is performed. A mammography registry refers to the sum of all the included facilities (range, 2–49 facilities per registry). The number of radiologists interpreting mammograms within a registry ranges from 10 to 165. The number of screening examinations per registry ranges from 11,563 to 80,911.
This report is restricted to 292,795 screening examinations identified as the first occurrence for a woman in the study period after the following exclusions: women with a prior history of breast cancer, women with additional imaging during the same visit, if the examination had no recorded assessment, and if the examination had a diagnostic indication. To further ensure that we included only screening mammograms, we also excluded any examination in 1997 that was preceded by a mammogram within 9 months.
Measurements
This study examined BI-RADS use by measuring the following: (a) the proportion of women assigned BI-RADS assessment categories 0–5 across the BCSC registries, (b) the proportion of women assigned specific recommendations, and (c) the proportion of assessments having the expected recommendation according to the BI-RADS.
We (S.H.T., L.E.I.) included the first recorded "routine screening" examination during the study for women in the registry and referred to it as the "index" examination. Demographic information (age, race, ethnicity, education), prior mammography experience, and assessments are collected in the course of usual care and pooled in a registry before being transferred periodically to the BCSC Statistical Coordinating Center. Encrypted unique identifiers for women, facilities, and radiologists maintain the anonymity of all data (12). The unique encrypted code is used to link records in the Statistical Coordinating Center over time (12).
Recording systems varied across the diverse facilities, but all included paper or automated summaries with an assessment and a management recommendation. Each assessment and management recommendation was recorded separately by the radiologist for each mammographic examination by using terminology consistent with the BI-RADS (described more fully later). The detailed text that might explain a recommendation could not be systematically recorded and summarized across facilities and was therefore not available for analysis. Examinations in which recording software created an automatic link between an assessment and management recommendation were excluded. This exclusion removed one registry (New Mexico) and several facilities that together accounted for 38,526 (11%) of the 350,632 women otherwise eligible for the study.
Assessments by radiologists at the study facilities corresponded with the six BI-RADS categories (Table 1) (3). These categories include one incomplete (category 0) and five final assessments (categories 1–5). Some facilities recorded two assessments at each examination, one for each breast, while others recorded a single assessment for each woman.
Recommendations for clinical follow-up were collected on the same form but recorded separately from the assessment. A single recommendation was given for 290,804 (99.3%) of all 292,795 assessments. For the 1,991 (0.7%) assessments with multiple recommendations, we included in this analysis the highest numbered recommendation among the following five ordered recommendations: (a) normal interval follow-up, (b) short interval follow-up (ie, <1 year), (c) needs additional imaging, (d) clinical examination or surgical consult, and (e) biopsy or fine-needle aspiration. The order reflects an increasing likelihood of malignancy. The data do not allow the separation of clinical examination and surgical consult recommendations. Women (14,814 [4.8%] of 308,003) with no recorded recommendation were categorized as "unknown" and not included in the tables or analysis. Some women (394 [0.1%] of 308,003) were given recommendations other than those listed earlier, which were recorded as "other" but not included in the tables. Recording of additional imaging did not include examinations to correct for technical deficiencies.
Because some registries recorded assessments and management recommendations for each breast, we established a systematic method for assigning the assessment and management recommendation for a woman. We (S.H.T., L.E.I.) rank ordered the assessments based on potential cancer risk ("negative," "benign finding," "probably benign finding," "need additional imaging evaluation," "suspicious abnormality," "highly suggestive of malignancy"). When assessments were not the same for both breasts we (S.H.T., L.E.I.) assigned the woman the higher assessment. The recommendations for the breast were ordered from 1 (low) to 5 (high) as shown. When examining the relationship between assessments and recommendations, the breast with the higher assessment was included for each woman and associated with the recommendation for the same breast.
Each case was classified as having received a recommendation consistent with the BI-RADS assessment if one of the following conditions (a–d) was met based on the third edition of the BI-RADS: (a) assessment of "need additional imaging evaluation" (category 0) with additional views or ultrasonography recommended, (b) assessment of "negative" or "benign finding" (category 1 or 2) with a normal interval follow-up recommended, (c) assessment of "probably benign finding" (category 3) with a recommendation for short interval follow-up, and (d) assessment of "suspicious abnormality" or "highly suggestive of malignancy" (categories 4 or 5) with a recommendation for clinical examination or surgical consult and biopsy or fine-needle aspiration.
Cancer Rates
Because preliminary findings suggested some potential discord between how assessments of abnormalities were associated with recommendations, we also calculated cancer rates and/or examination for categories 0, 3, 4, and 5. BCSC registry data include cancers identified through local Surveillance Epidemiology and End Results registries, state cancer registries, and pathology databases. A file of uniquely encrypted records of women with cancer can then be linked to the file of assessments for those women. We (S.H.T., L.E.I.) linked the index examinations to cancer occurrence within the next 365 days for the subset of registries where the data was available (Colorado, North Carolina, New Hampshire, Washington [one registry], and Vermont).
Data Analysis
We (S.H.T., L.E.I.) examined the data in the following three ways: (a) first assessment by mammography registry, (b) first recommendation by mammography registry, and (c) first assessment by recommendation for all registries combined. We then evaluated the proportion of assessments having recommendations consistent with the BI-RADS by age categories (40–49 years, 50–59 years, 60–64 years, and 65 years or older) and prior mammography experience (<1 year, 1–2 years, 3–4 years, >5 years, and never) for each assessment category by using a 2 statistic to compare proportions. A P value of less than .05 was considered to indicate a statistically significant difference.
RESULTS |
---|
Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES |
---|
Table 2 shows the first assessments for the index examinations (n = 292,795) for 1997. A majority (n = 269,022; 91.88%) of women had "negative" or "benign finding" assessments (category 1 or 2, respectively), but registries differed in the proportion of each. Across all registries combined, the "negative" assessment was used much more commonly (78.89%) than the "benign finding" assessment (12.99%). Not shown in the Table is that cancer rates were 10 in 1,000 examinations for category 3, 34 in 1,000 for category 0, 167 in 1,000 for category 4, and 684 in 1,000 for category 5.
|
|
|
The proportion of management recommendations consistent with assessments differed across age categories for "probably benign finding" (category 3) (P < .001) and "suspicious abnormality" (category 4) (P < .001). Among women ages 40–49, 50–59, 60–64, and 65 years or older, the proportion recommended for short interval follow-up was 38.9% (1,615 of 4,155), 39.2% (1,396 of 3,559), 33.6% (386 of 1,147), and 43.3% (1,299 of 3,000), respectively. Among women ages 40–49, 50–59, 60–64, and 65 years of age or older in category 4, the proportions recommended for biopsy, fine-needle aspiration, or clinical examination or surgical consult were 52.3% (262 of 501), 56.1% (249 of 444), 54.2% (96 of 177), and 65.6% (330 of 503), respectively. Among women in category 4, a higher proportion were recommended for additional imaging rather than biopsy, fine-needle aspiration, or clinical examination or surgical consult among women ages 40–49 years (220 [43.9%] of 501) compared with women ages 65 years and older (160 [31.8%] of 503).
The proportion of consistent recommendations differed for women with probably benign assessments whose last mammograms were more recent (P < .001). For women with a mammogram obtained within less than 1 year previously, 1–2 years, 3–4 years, more than 5 years, and never, the proportion recommended for short interval follow-up was 40.7% (374 of 919), 33.7% (1,369 of 4,063), 41.1% (546 of 1,328), 48.9% (387 of 791), and 53.7% (780 of 1,453), respectively, among women with a "probably benign finding" assessment. Also among women with a "probably benign finding" assessment, 28% (257 of 919) of those whose most recent prior mammogram occurred within less than 1 year were recommended for normal interval follow-up compared with 21.0% (852 of 4,063) of those whose most recent mammogram occurred 1–2 years previously. Additional imaging was recommended 26.7% (245 of 919) and 41.6% (1,688 of 4,063) of the time among women whose most recent prior mammogram occurred less than 1 year and 1–2 years earlier, respectively.
DISCUSSION |
---|
Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES |
---|
Free text might explain some apparent inconsistencies between assessments and recommendations in our data, although the text was not available for analysis. Rarely (0.11%, category 1; 0.42%, category 2; 3.13%, category 3), women with a negative, benign, or probably benign assessment were referred for clinical and/or surgical evaluation. While an incorrect use of the BI-RADS might explain the discrepancy, it is also possible that the discrepancy is due to the presence of a clinical finding. Even though the examination was intended for screening, findings occur, and depending on that finding the radiologist may recommend surgical evaluation or reexamination by the referring clinician. A limitation in these data are that we cannot separate a recommendation for clinical evaluation from one for a surgical evaluation. The surgical recommendation is inconsistent with the BI-RADS, but the text might show that it was clinically indicated if a physical finding was present. Automated mammography reporting systems might include standard text to address this situation, in which clinical findings are driving the recommendation.
Inconsistencies are much more common among the 8.1% of examinations with a breast abnormality on the image, particularly among examinations where there is a "probably benign" or "suspicious" abnormality. When assessments were recorded for each breast, a small proportion of the time (0.01% of all assessments), the most serious recommendation occurred in the breast that was contralateral to the most serious assessment. While it is a small proportion, it points out the need to confirm the laterality of any lesion before subsequent evaluation or biopsy.
Assessments and management recommendations for "probably benign" lesions are not completely consistent with the BI-RADS. Use of the assessment has been shown to vary with facility and age (14), and some controversy exists about whether it should ever be used until after full problem-solving imaging (15,16). Despite the interpretive challenges, the group of lesions appears to be highly predictive of a benign lesion in some centers, and associated with a cancer rate of 1% in our data (9,17). This suggests that the assessment category is being used correctly, since the expected rate of cancer for these lesions is less than 2% (18).
Since the cancer rate among the "probably benign" lesions is low, the problem with the implementation of the BI-RADS category 3 appears to be with the associated management recommendation. While a relatively high proportion of women with "probably benign finding" assessments received the expected short interval follow-up recommendation (4,782 [40.3%] of 11,861), a high proportion were recommended for additional imaging (n = 4,374; 36.9%), and some were recommended for "normal interval follow-up" (n = 2,227; 18.8%). We hypothesize that the high proportion recommended for additional imaging may reflect interest in conveying a level of concern to a referring clinician. The radiologist may want clarification of an ambiguous lesion that is judged likely to be cystic and therefore order additional imaging with ultrasonography. However, the radiologist may want to be clear that the risk of cancer is low so that the referring physician’s discussion conveys the appropriate level of breast cancer risk to the woman. The substantial proportion of women with annual follow-up may be due to a suggested protocol of following the abnormality annually after two 6-month follow-up mammograms (17). Our result that women with a "probably benign finding" were more likely to have a recommendation for annual mammography if they had a mammogram in the previous year supports this interpretation, but consideration should also be given to other explanations. For example, a recent prior mammogram was associated with less use of additional evaluation, suggesting that the comparison film provided a reassuring reference. This same reassurance might explain why annual follow-up, rather than 6-month follow-up, was recommended. However, more research into radiologist’s thinking in these circumstances is needed before we can provide an adequate explanation.
Improvements in the BI-RADS use might occur through some modifications in documenting instructions and teaching, although our results show that the current terminology establishes consistent recommendations for a majority of women. For example, the BI-RADS instructions might further improve consistency by specifying how to categorize "probably benign findings" that appear stable after 6- and 12-month follow-up assessments. Consideration could also be given to stating that such follow-up studies be classified as diagnostic. Teaching about the intended use of the BI-RADS category 0 could improve consistency if it meant that the category was used more frequently. The BI-RADS committee intended that additional imaging occur after category 0 assessments (Sickles EA, oral communication, 2001). Screening interpretations would therefore include categories 1, 2, and 0. Any additional imaging would be done after a category 0 interpretation, and then a category 1–5 would be assigned. It is expected that less than 10% of screening examinations would require recall for additional imaging (8). We report 5.1% of women being recommended for additional imaging, but they are distributed among several assessments, such that category 0 accounts for only 3.43% of all interpretations. Teaching might reduce recommendations for additional imaging after assessments other than "category 0, need additional imaging."
Changes in terminology could also be considered to further improve consistency. For example, we found that categories 4 and 5 assessments commonly include a proportion of women recommended for additional imaging. As noted earlier, this is not the intended use of the BI-RADS and may reflect a misunderstanding of how to use the categories. However, it may also be that the radiologist wants the evaluation of associated abnormalities to occur before a biopsy of an obvious lesion. The referring physician is therefore provided with a recommendation for additional imaging in order for an evaluation to be done in the appropriate order and to convey the radiologist’s level of concern. Because we commonly found additional imaging recommended among category 3, 4, and 5 assessments, some consideration could be given to modifying category 0 in order for the level of concern to be stated clearly, such as "incomplete assessment of a probably benign finding" and "incomplete assessment of a suspicious finding." In the former group, the imaging would be done primarily to clarify an ambiguity. In the latter group, the additional imaging would be done to clarify associated lesions and establish the extent of disease. Whether such a change would lead to more confusion or increased consistency needs further evaluation. It seems unlikely that teaching alone will eliminate the problem, if the current inconsistent use addresses a need to convey the radiologist’s level of concern. Now that the Mammography Quality Standards Act is in place, and more teaching has occurred about the BI-RADS, it will be important to see whether the inconsistency has been reduced.
One might also conclude that the BI-RADS should be implemented with only the assessment and recommendation linked by the recording software. However, implementation of recording software is still underway throughout the U.S., and many radiologists continue to dictate the assessments and recommendations. Education may increase consistency in the use of the BI-RADS terminology, but more needs to be done to evaluate the clinical concerns that may be limiting radiologists’ association of assessments and recommendations in practice. Such work would be facilitated by software that allowed automated coding to indicate that a recommendation is based on a clinical finding when there is no mammographic finding, since the BI-RADS is intended to address only mammographic findings. However, radiologists must guide the care of people, and so recommendations may be based on more than mammographic findings. Some facilities within our registries and some recording software have already implemented wording to explain management recommendations based on clinical knowledge. Such software might also identify inconsistencies between assessments and recommendations that might avoid the rare event (0.01%) noted in our data, in which the highest assessment and recommendation appeared to be in opposite breasts.
A limitation of this work is that these data were from 1997, before the Mammography Quality Standards Act required the recording of the BI-RADS assessments. However, The American College of Radiologists first published its BI-RADS in 1993, and radiologists in these registries had been using the terminology since at least 1996 and, in many cases, well before that time (1,3). The Food and Drug Administration requirement for using the BI-RADS assessments was implemented in April of 1999 and does not require their linkage to recommendations. It is unlikely that the requirement itself will therefore change the relationship between the assessments and the recommendations, but future research should evaluate this possibility.
Another limitation of this study is that it is not possible to assess how the recording systems used might have limited the association of interpretations and recommendations. The ability to independently record assessments and recommendations was a feature of recording sheets and software that was requested by radiologists at the start of this study. Registry sites used slightly different approaches but most simply had paper systems where the assessment and recommendation could each be indicated. Some facilities continued to dictate in addition to recording the findings on a paper form. While the structure of the form may have allowed the dissociation between the BI-RADS assessments and recommendations, it is impossible to assess the magnitude of the effect on this dissociation of differences in the recording systems themselves. It seems unlikely that the forms themselves did anything more than allow radiologists to record assessments and recommendations in accordance with their practice.
In conclusion, we provide evidence of how well terminology consistent with the BI-RADS was implemented in practice by 1997. While BI-RADS terminology was used consistently for the 91.9% of women with a "negative" or "benign finding" assessment, there is room for improvement in its use among the 8.1% of women who have a mammographic finding that needs additional attention.
The apparent inconsistency in use may also be important for planning educational efforts and for future research. Although the radiologists in this study all used the BI-RADS, that practice was not a requirement until April of 1999 (Mammography Quality Standards Act). In accordance with the requirement, the American College of Radiology has paid a great deal of attention to education regarding the BI-RADS terminology, including publishing the third edition of the BI-RADS manual, mailing of the manual to all American College of Radiology–accredited facilities, developing a continuing medical education course on the use of the BI-RADS, and creating a CD-ROM–based mammography interpretive skills self-assessment examination that emphasizes proper use of BI-RADS terminology (Sickles EA, oral communication, 2001) (3). Future teaching might address the inconsistencies reported here by stressing the appropriate use of "need additional imaging evaluation" and the distinction between "incomplete" and "final" assessments for screening interpretations, which is currently well documented in the BI-RADS manual. Future work should evaluate the effect of educational efforts, whether the inconsistencies reported here appear in other settings, whether the inconsistencies persist over time, and whether modifications in the reporting system are needed to address clinical realities.
REFERENCES |
---|
Top ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION REFERENCES |
---|