点击显示 收起
1 From the Department of Epidemiology, German Institute of Human Nutrition Potsdam, Rehbrücke, Germany (KH and HB), and the Department of Internal Medicine, University Hospital Hamburg-Eppendorf, Germany (BCZ and EW)
2 The funding for the CORA Study came primarily from the German Stifterverband für die Wissenschaft. 3 Address reprint requests and correspondence to K Hoffmann, Department of Epidemiology, German Institute of Human Nutrition, Arthur-Scheunert-Allee 114-116, 14558 Bergholz-Rehbrücke, Germany. E-mail: khoff{at}mail.dife.de.
ABSTRACT
Background: In previous studies, dietary patterns were derived in different populations without regard to a specific outcome.
Objective: The objective was to apply a new statistical method to construct a specific dietary pattern that is strongly associated with the risk of coronary artery disease (CAD).
Design: We applied reduced rank regression to a sample of 200 cases and 255 controls from the Coronary Risk Factors for Atherosclerosis in Women (CORA) Study. The CAD-specific dietary pattern was constructed by choosing intake data for 49 food groups as predictors and 5 established biomarkers for CAD as responses.
Results: A high score for the constructed dietary pattern was characterized by high intakes of meat, margarine, poultry, and sauce and low intakes of vegetarian dishes, wine, vegetables, and whole-grain cereals. After adjustment for known CAD risk factors, the relative risks from the lowest to the highest quintiles of the pattern score were 1.0, 1.1, 3.6, 6.2, and 12.3 (95% CI: 4.9, 30.9; P for trend < 0.0001). There was an approximate 4.5-fold difference in C-reactive protein and a 2-fold difference in C-peptide between the highest and lowest score quintiles of the study population. HDL-cholesterol concentrations ranged from 70 mg/dL in the lowest quintile to 49 mg/dL in the highest quintile of dietary pattern score.
Conclusion: The new statistical method, reduced rank regression, may be a useful tool for identifying dietary patterns that simultaneously affect the concentrations of known CAD biomarkers and the risk of developing CAD.
Key Words: Coronary artery disease risk factors nutrition dietary patterns biomarkers statistical methods
INTRODUCTION
Many studies have attempted to link intakes of individual foods, food groups, or nutrients and the risk of coronary artery disease (CAD) (18). However, foods are consumed in combination, and their combined effect on CAD cannot be predicted from single effects. Therefore, diet-quality scores based on recommended diets or dietary guidelines (911) and generalized diet-quality scores calculated by using sums of quintile numbers of selected nutrients (12) have been proposed. Unfortunately, these scores reflect only a few aspects of diet and do not account for correlations between score components. Thus, other scores are needed that are functions of all consumed foods and that can cover the combined effect of diet. Such scores are called dietary patterns.
A possible approach to derive dietary patterns consists of applying statistical dimension-reduction techniques to the full data set of all food intakes (13). Two well-known dimension-reduction techniques similar in their mathematical foundation are principal component analysis and factor analysis. They aid in the construction of linear functions of food intakes that explain as much variation of the original intake data as possible. Although both statistical methods have been widely applied to derive dietary patterns in epidemiologic studies of CAD (1417), neither of them aims to find out what combination of consumed foods mostly increases the risk of CAD. A dietary pattern explaining a high percentage of food consumption variation is not necessarily associated with the development of CAD, as can be seen from the nonsignificant results in the literature (16). Thus, another statistical method appropriate for constructing disease-specific dietary patterns is required.
Reduced rank regression (RRR) is a new dimension-reduction technique that can fill the gap. In contrast with the exploratory methods principal component analysis and factor analysis, RRR can allow for prior information about the pathway from diet to disease. Applying RRR to food groups and choosing biochemical markers for CAD as response variables, linear functions of food groups are derived by maximizing the proportion of explained biomarker variation. This approach ensures that a change in RRR pattern score results in a change in CAD biomarker concentrations, which is generally associated with a decrease or increase in CAD risk. We applied RRR to food consumption data collected through a food-frequency questionnaire (FFQ) in the Coronary Risk Factors for Atherosclerosis in Women (CORA) Study.
SUBJECTS AND METHODS
Study population
The CORA Study is a population-based case-control study. Cases were women aged 30-80 y admitted to the Department of Internal Medicine at the University Hospital Hamburg-Eppendorf, Germany, for incident CAD in the period from 1997 to 2001. Incident CAD was defined as a first acute myocardial infarction or a first episode of angina or other symptoms suggesting coronary artery disease. Inpatients were only included in the study if the first manifestation of coronary artery disease was verified by angiography and if they had no cancer, acute consumptive or severe chronic disease, a previous CAD diagnosis, or dietary advice regarding CAD. Identification of cases was undertaken 7 d/wk to prevent selection bias and to ensure blood sampling within 24 h of the onset of symptoms in the cases of acute myocardial infarction. The participation rate of eligible patients was 100%, or 200 cases.
For each patient, 2 controls of the same age and from the same city district of Hamburg were randomly chosen on the basis of the population registry. They were invited by mail to participate in the study. The invitation was followed by a reminder or a telephone call after 14 d. Exclusion criteria for the controls were the same as for cases. The final control group comprised 255 women, which corresponded to a participation rate of 67% of those who were eligible.
Data collection
Dietary intake information was collected by a self-administered scanner-readable FFQ originally designed for and applied in the German part of the EPIC (European Prospective Investigation into Cancer and Nutrition) Study (18). The FFQ included questions on the frequency and portion size of 146 food items eaten during the previous year. Foods were classified into 49 food groups based on nutrient profiles or culinary usage (19). Total caloric intake was calculated by summing up energy intakes from all foods by using data from the German Food Code BLSII.2. A validation study among members of the EPIC-Potsdam cohort showed good correlation between nutrients assessed by this FFQ and multiple 24-h dietary recalls. Energy-adjusted correlation coefficients ranged from 0.54 for dietary fiber to 0.86 for alcohol (20).
Information about sociodemographic characteristics and suspected risk factors for CAD, including smoking habits, physical activity, hormone replacement therapy, medication use, and family history of cardiovascular disease were obtained by questionnaire and by conducting computer-assisted person-to-person interviews. Smokers were defined as cigarette smokers and were categorized into those subjects who smoked < and 20 cigarettes/d. Former smokers were categorized into those subjects who stopped smoking within the past and >2y, because previous studies have indicated that much of the coronary disease risk attributable to smoking disappears within 2 y of quitting (21, 22). Women were defined as postmenopausal if they had no regular monthly period for >1 y or were receiving hormone replacement therapy.
Anthropometric measurements including body weight, body height, waist circumference, and hip circumference were obtained while the subjects wore no shoes and only light underwear. Body weight was measured to the nearest 0.5 kg. The waist was taken at the smallest circumference between the lower rib margin and the iliac crest, and the hip circumference was determined over the greater trochanters. All interviews and physical examinations were performed by the same trained investigator.
A fasting blood sample was collected as soon as possible, within at least 24 h in women who had experienced an acute myocardial infarction. Routine laboratory measurements were performed with the use of standard techniques. LDL cholesterol was calculated by using the formula of Friedewald. Blood pressure was measured 3 times after the interviews while the subjects were in a sitting position. The results of the second and third measurements were averaged (23). Subjects were considered to have hypertension if they were taking antihypertensive drugs or had a systolic blood pressure 140 mm Hg or a diastolic blood pressure of 90 mm Hg according to the definition of the World Health Organization (24). All participants obtained an evaluation of their dietary habits and their laboratory results. The study protocol was approved by the ethical committee of the Aertztekammer Hamburg.
Statistical methods
To derive a dietary pattern predictive of CAD risk we applied RRR. RRR is a statistical dimension-reduction technique similar to principal component analysis; however, it works with 2 different sets of variables called predictors and responses (25). Whereas principal component analysis determines linear functions of predictors by maximizing the explained variation of all predictor variables, RRR identifies linear functions of predictors that explain as much response variation as possible. Principal component analysis is a special type of RRR in which the 2 sets of variables are identical. In the subsequent application of RRR, we choose food groups as predictors and biochemical markers for CAD as responses. Thus, we calculated linear functions of food group intakes (dietary patterns) that explain as much variation in CAD biomarkers as possible.
The implementation of RRR is based on the determination of eigenvalues and corresponding eigenvectors of the covariance matrix of responses. Choosing the eigenvectors as coefficient vectors for responses results in different linear functions of responses called response scores. These response scores have to be projected on the space of predictors to have linear functions of predictors called factor scores. The successive extracted scores are uncorrelated because the eigenvectors are always orthogonal. To each factor score belongs an eigenvalue quantifying the proportion of response variation that is explained by this factor only. Therefore, factors should be sorted by decreasing eigenvalues. The first factor obtained by RRR is optimal in the sense that no other linear function of predictors explains more response variation than does this factor. The method RRR is implemented in the special procedure PLS of SAS for WINDOWS (release 8.02; SAS/STAT Users Guide 1999). A more detailed description of the method, including the SAS code and its application in nutritional epidemiology, can be found elsewhere (26).
Adjusted means were calculated, and the differences in adjusted means were tested to be significantly different from zero by using the statement LSMEANS of the general linear model procedure in SAS. Conditional logistic regression was used for estimating relative risks of CAD adjusted for potential confounding variables. Trends in CAD risk across a continuous risk factor were assessed in logistic models by categorizing the risk factor in quintiles and by using the quintile number as an independent variable.
Biomarkers for CAD
For this RRR analysis we exclusively chose CAD biomarkers as response variables. We confined ourselves to primary biomarkers measured in serum, which can provide the link between the consumption of specific food groups and the development of CAD. Those biomarkers allow simultaneously the measurement of dietary exposure and act as indicators of health effects before the final outcome (27). Conventional biomarkers for CAD are LDL and HDL cholesterol. It has been shown by clinical intervention that lowering LDL-cholesterol concentrations and raising HDL-cholesterol concentrations decrease the risk of CAD (28). In several studies the predictive utility of serum cholesterol has been shown (2932). However, despite its predictive utility, LDL cholesterol may not necessarily distinguish controls from cases at the time of manifestation of CAD (33).
Beside cholesterol we chose lipoprotein(a) as a further response variable for RRR because it seems to be a predictor of future myocardial infarction, coronary death, and angina pectoris (3439). Several prospective studies showed that elevated lipoprotein(a) further increased the risk of CAD in subjects with low HDL-cholesterol and high LDL-cholesterol concentrations (38, 39). Finally, we chose C-reactive protein (CRP)the classic acute-phase proteinand C-peptide as the fourth and fifth response variables in our statistical analysis. Recent studies of CRP showed a consistent positive association with future cardiovascular events in initially healthy subjects (4043). High serum concentrations of C-peptide, which indicate insulin resistance, significantly increased the risk of CAD in diabetic and nondiabetic patients (4446).
RESULTS
The statistical method RRR was applied to the pooled data set of 455 cases and controls; 49 food groups were chosen as predictors and the 5 selected biomarkers were chosen as responses. The first RRR factor explained 17.8% of HDL cholesterol, 14.8% of C-peptide, 5.9% of CRP, and 7.8% of the total variation of all 5 biomarkers. The subsequent 4 RRR factors explained only 4.4%, 2.1%, 1.7%, and 1.0% of total biomarker variation, respectively. Therefore, we only considered the first RRR factor as the dietary pattern of interest. The response score of the first RRR factor was characterized by a negative coefficient for HDL cholesterol and by positive coefficients for the other 4 selected biomarkers, suggesting a direct association with CAD incidence.
The major contributors to the dietary pattern are quantified in Table 1. Intakes of meat, margarine, other vegetable fats and oils (except margarine and olive oil), poultry, and sauce were directly associated with the score perceivable by positive score parameters as well as positive correlations with the score. In contrast, consumption of vegetarian dishes, wine, cooked vegetables, raw vegetables, and whole-grain cereals and muesli decreased the pattern score. The 10 food groups listed in Table 1 explained together 71.9% of the variation in score; the most important contributors were vegetarian dishes, wine, and meat.
View this table:
TABLE 1. Food groups that are strongly associated with the dietary pattern obtained by reduced rank regression in the Coronary Risk Factors for Atherosclerosis Study, 1997-2000
As shown in Table 2, the mean intake of the 10 most important food groups varied clearly across quintiles of the dietary pattern score. There was an approximate 2.5-fold difference in meat and margarine intakes and a 2-fold difference in the intakes of poultry, sauce, and other vegetable fats and oils (except margarine and olive oil) between the highest and lowest quintiles of the study population. Wine consumption ranged from 103 g/d in the lowest quintile to 17 g/d in the highest quintile of pattern score. Trends in all 10 food groups across quintiles of pattern score were statistically significant (P < 0.0001). Moreover, mean biomarker concentrations for each score quintile are shown in Table 2. It is striking that HDL cholesterol decreased, whereas C-peptide and CRP increased across increasing quintiles of pattern score (P < 0.0001). In other words, a high dietary pattern score was associated with a biomarker pattern of low HDL cholesterol, high C-peptide, and high CRP. No trend across quintiles of dietary pattern score was perceivable for LDL cholesterol and lipoprotein(a).
View this table:
TABLE 2. Food intake and biomarker concentrations according to quintiles of the dietary pattern obtained by reduced rank regression in the Coronary Risk Factors for Atherosclerosis Study, 1997-2000
To explore whether the 5 selected response variables could be considered biomarkers of CAD, we calculated crude and adjusted means for cases and controls separately (Table 3). The subjects with CAD had significantly lower HDL-cholesterol and significantly higher lipoprotein(a), C-peptide, and CRP concentrations than did the subjects without CAD. These differences were attenuated after adjustment for age, BMI, waist-hip ratio, cigarette smoking, hormone replacement therapy, hypertension, education level, physical activity level, sports (h/wk), and total energy intake but remained highly significant (P < 0.0001), except for lipoprotein(a). After further adjustment for the other 4 biomarkers, the differences were smaller and the P value increased slightly. No significant difference between cases and controls was observed for LDL cholesterol.
View this table:
TABLE 3. Mean biomarker values for cases and controls in the Coronary Risk Factors for Atherosclerosis Study, 1997-20001
The correlation structure between biomarkers is described in Table 4. HDL cholesterol, C-peptide, and CRP were significantly correlated with each other (P < 0.0001); however, the correlation coefficients with HDL cholesterol were negative. A moderate positive correlation was observed between LDL cholesterol and lipoprotein(a). The data in Tables 3 and 4 suggest that the simultaneous occurrence of low HDL-cholesterol and high C-peptide and CRP concentrations is a prevalent biomarker profile in the study population that is likely related to the development of CAD.
View this table:
TABLE 4. Pearsons correlation coefficients between plasma biomarkers in the Coronary Risk Factors for Atherosclerosis Study, 1997-20001
Then we used the RRR dietary pattern as a predictor of CAD in a logistic regression model. A higher dietary pattern score was strongly associated with a monotonic higher risk of CAD (Table 5). Apparently, the number of cases markedly increased by increasing quintiles of pattern score. Noting that 91 subjects belong to each quintile, only 12 subjects in the lowest quintile had CAD, whereas 71 subjects in the highest quintile had CAD. The crude relative risks across increasing quintiles of dietary pattern score were 1.0, 1.9, 4.9, 11.6, and 23.4 (P for trend < 0.0001). Adjustment for age, cigarette smoking, hormone replacement therapy, hypertension, education level, physical activity level, and sports attenuated the risks. The adjusted relative risks across increasing quintiles were 1.0, 1.4, 3.2, 6.5, and 11.4 (P for trend < 0.0001). The strongest confounding factors were hypertension and cigarette smoking. Further adjustment for BMI and waist-hip ratio, which are strong predictors of CAD and affected by diet, did not change the effect on risk of the dietary pattern score. The relative risk for the comparison of the highest with the lowest quintile was 12.3 (95% CI: 4.9, 30.9; P for trend < 0.0001).
View this table:
TABLE 5. Relative risks (95% CIs) of coronary artery disease according to quintiles of the dietary pattern obtained by reduced rank regression in the Coronary Risk Factors for Atherosclerosis Study, 1997-20001
We conducted several sensitivity analyses to examine the robustness of our findings. We applied RRR to a reduced set of response variables by omitting biomarkers one at a time. As a result, the first RRR factor score did not change substantially and remained a strong predictor of CAD when adjusting for the same covariates as before (Table 6). The relative risks of CAD increased monotonically across increasing quintiles of the score (P < 0.0001) regardless of which biomarker was omitted. Furthermore, we enlarged the response variables by adding other biomarkers measured in serum. When the 5 markers triacylglycerols, apolipoprotein A-I, homocysteine, folate, and iron were added to the 5 originally chosen biomarkers, the association between the first RRR factor and the risk of CAD was even stronger (Table 6). Other choices of up to 5 additional response variables yielded somewhat weaker results, but in each case the first RRR factor score was a strong predictor of CAD, and the trend across increasing quintiles of the pattern score was highly significant (P < 0.0001).
View this table:
TABLE 6. Adjusted relative risks (95% CIs) of coronary artery disease according to quintiles of dietary pattern scores derived by reduced rank regression with different biomarkers as response variables in the Coronary Risk Factors for Atherosclerosis Study, 1997-20001
To address concerns that the results are not reproducible in other data sets, we randomly split the sample in half and ran separate analyses in the 2 subsamples. The relative risks of CAD were attenuated if the pattern score was used in the other subsample; however, a significant trend (P = 0.01 and P = 0.03, respectively) across increasing quintiles of the score remained (Table 7). To reduce the dependency of the dietary pattern from the data, we applied the approach of simplified pattern (47). We constructed a simplified pattern variable by omitting the 39 food groups with small score parameters and by setting the remaining positive score parameters equal to 1 and the remaining negative score parameters equal to 1. The resulting pattern variable was simply the sum of the 5 standardized food groups given in the top of Table 1 minus the sum of the 5 standardized food groups given in the bottom of Table 1. The adjusted relative risks across increasing quintiles of the simplified pattern score were 1.0, 3.1, 4.8, 9.9, 6.5 (P for trend = 0.0005) and 1.0, 1.3, 2.0, 2.2, 6.6 (P for trend = 0.002), which indicated a similar effect on CAD risk in both subsamples.
View this table:
TABLE 7. Adjusted relative risks (95% CIs) of coronary artery disease according to quintiles of different dietary pattern scores in subsamples of the Coronary Risk Factors for Atherosclerosis Study, 1997-20001
DISCUSSION
In the CORA Study we found a strongly significant association between a specific dietary pattern and the risk of CAD. We used the new statistical method RRR to derive this pattern by maximizing the percentage of explained variation of established biomarkers of CAD. A high score of the dietary pattern corresponded to a biomarker profile of high concentrations of C-reactive protein and C-peptide and low concentrations of HDL cholesterol. RRR was not only useful in identifying dietary changes that increase or decrease CAD risk but also in elucidating the possible pathway from diet to disease by considering changes in biomarker concentrations.
What is the interpretation of a dietary pattern score derived by RRR? In contrast with a pattern obtained by principal component analysis or factor analysis, an RRR factor score does not necessarily represent a combination of foods and drinks that are often eaten together in the study population. An RRR dietary pattern is a combination of food intakes that affects biomarker concentrations if biomarkers were chosen as response variables. The aim of this RRR application was not to describe real-word dietary behaviors but rather to find out what variation in diet is important for the development of disease. Persons with a high RRR score have a higher disease risk than do those with a low score. Moreover, the score coefficient of a food group allows to evaluate the single effect on disease of this food group supposing that the intakes of all other food groups are unchanged.
Participants who ate more meat, margarine, other vegetable fats and oils (except margarine and olive oils), poultry, and sauce, and consumed less vegetarian dishes, wine, vegetables, and whole-grain cereals had a high pattern score associated with a high risk of CAD. The observed effects of the food groups are broadly consistent with findings of others. High consumption of red and processed meat and high-fat products have been identified as a main constituent of Western diet that is associated with an unfavorable biomarker profile (48) and elevated risk of CAD (14, 15). In the Seven Countries Study, animal foods (including meat) were highly correlated with CAD death rates (49). The role of margarine in the development of CAD has been intensively studied (50, 51). Margarine, especially hard margarine, is a major source of trans fatty acids, which increase LDL-cholesterol and decrease HDL-cholesterol concentrations, and in turn, increase the risk of heart disease (52, 53). Although the content of trans fatty acids in margarine has strongly decreased in Germany, some German sunflower margarines still contain these unfavorable fatty acids (54).
Consistent with our results, several studies have shown a significant inverse relation between vegetable intake and risk of CAD (4, 49, 55). In a pooled analysis of the Nurses Health Study and the Health Professionals Follow-up Study, the relative risk of CAD in a comparison of extreme quintiles of vegetable intake was 0.82 (95% CI: 0.71, 0.94) (6). However, although fruit consumption also has a protective effect on CAD incidence it does not contribute to the RRR dietary pattern. The beneficial effect of whole grain intake as a constituent of the CAD dietary pattern obtained in the CORA Study was previously found in 2 large prospective studies (2, 56, 57).
Many studies have shown that light-to-moderate alcohol consumption was associated with a lower CAD risk (5862). An experimental dose of 30 g ethanol/d increased the concentrations of HDL cholesterol by 4 mg/dL, which indicated that the effect of alcohol consumption on CAD risk may partially be mediated by HDL cholesterol (63). There is controversy concerning the type of alcoholic beverage responsible for the effect (62). Our finding suggests that wine has an effect on CAD incidence, which is mediated through the selected biomarkers.
A main limitation of the present study was its case-control design. Although we did not include inpatients who received dietary advice, we cannot rule out that some cases changed their dietary habits as a consequence that they perceived experiencing the first symptoms of CAD. Moreover, knowledge of disease status can cause recall bias. Furthermore, selection bias in the recruitment of controls from the general population was possible because only 67% of the control candidates participated in this study, and the participants were probably more health conscious than were the nonresponders. It is necessary to replicate this RRR analysis in large cohort studies. The preliminary results of applying RRR in a nested case-control study (26) suggest that the association between dietary pattern score and disease could be weaker in a study of prospective design, especially if the number of incident cases is small. A possible explanation for this suggestion is that the variation in RRR score will be low in studies with many controls, provided that the selected responses do not differ markedly between the controls. The stronger results found with this RRR application than with those found with the previous approach (26) can partly be attributed to the use of biomarkers instead of disease-related nutrients as response variables.
Choosing biomarkers as response variables in RRR, the optimal time to take blood samples for the measurement of biomarker concentrations is a critical issue. In a case-control study design, blood samples are always taken from the cases after the diagnosed onset of the disease. Thus, biomarker concentrations can be affected by disease and thus should not be considered indicators of health effects preceding the disease. In contrast, a cohort study would ensure that biomarker measurements always be made before the final outcome. However, the time period that elapses between the time of the collection of blood samples and the time of the onset of disease can be large and individually different, which decisively diminishes the predictive power of biomarkers. The best time point to predict disease by biomarkers is immediately before disease. Assuming that biomarker concentrations do not rapidly decrease or increase with the onset of disease, which should be satisfied for the selected biomarkers (except for CRP), blood sampling within 24 h of the onset of symptoms as accomplished in the CORA Study should be next to optimal.
In conclusion, the new statistical method RRR is a powerful tool to derive dietary patterns that predict the incidence of CAD. This study showed that a diet high in vegetarian dishes, wine, vegetables, and whole-grain cereals and low in meat, margarine, poultry, and sauce is associated with a more favorable biomarker profile and a reduced risk of CAD. Together with other available evidence, these results can be used when developing dietary recommendations to prevent premature cardiovascular events. The CAD-specific dietary pattern score may also be helpful in evaluating individual eating habits in dietary counsel.
ACKNOWLEDGMENTS
We thank the participants of the CORA Study for their cooperation, Ellen Kohlsdorf and Wolfgang Bernigau for data management, Kerstin Klipstein-Grobusch and Christin Heidemann for critical comments on the revised draft of the manuscript, and Jürgen Kratzsch and Roman Jung for performing the laboratory measurements.
B-CZ and EW planned and conducted the CORA Study. KH conducted the statistical analysis. KH and HB wrote the first draft of the manuscript. All authors stated having no conflict of interest due to authorship of the submitted paper.
REFERENCES