A Propensity Score Matched Case-Control Study of Historical Physical Activity and Mortality in U.S. Adults

Background: Propensity score matching (PSM) is a technique that matches individuals based on their predicted probability relative to specified characteristics. In a case-control study, controls can be selected using PSM so that they are similar to cases on all relevant covariates. The purpose of this study was to use PSM to investigate the relationship between historical physical activity (PA) and mortality in a large national survey.


Introduction
Physical activity (PA) is a preventive health behavior that protects against chronic disease as well as premature mortality [1]. Current U.S. guidelines for PA recommend all adults accumulate 150+ minutes each week of moderate-intensity PA [2]. A recent study examined the relationship between increasing amounts of PA and all-cause mortality in a large sample of women, where PA was assessed objectively using accelerometers [3]. Results from this study showed that both total PA amount (counts/day) and moderate-to-vigorous PA amount (min/day) were inversely related to all-cause mortality risk in women. Another recent study examined the association between PA and all-cause mortality among elderly Hong Kong adults [4]. Results from this study showed that PA, assessed by an age-specific PA scale, was inversely related to all-cause mortality. Moreover, the researchers in this study showed that the PA and mortality relationship remained after controlling for cardio respiratory fitness. Similar inverse associations between PA and mortality have been reported for heart disease [5][6][7] and cancer-related [8][9][10][11]  Case-control studies are retrospective observational studies that are commonly used in epidemiological research [12]. In a case-control study, "cases" are first selected based on the presence of a specific disease while "controls" without the disease are then selected so as to match the cases on certain characteristics (i.e., age and sex) [13]. Historical exposures (i.e., risk factors) are assessed and then compared between the cases and controls to determine if associations exist. Case-control studies are considered "retrospective" because they begin with the outcomes and look back (historically) to the causes. For example, in a Fujian population, a case-control study was employed to determine if PA was associated with lung cancer [14]. In this study, cases were those with newly diagnosed lung cancer and controls were age-and sex-matched individuals without lung cancer. Participants reported their past two-year occupational as well as recreational PA. Results from this study showed that occupational PA was associated with an increased risk of lung cancer whereas recreational PA was associated with decreased risk of lung cancer. PA behavior has been examined historically in many case-control studies concerning such outcomes as cancer [15][16][17], stroke [18], diabetes [19], chronic obstructive pulmonary disease [20], and liver disease [21].
By design, large cross-sectional surveys do not have the data needed to be used in case-control studies. However, some U.S. health surveys do have follow-up probability-matched mortality status associated with their participants [22]. These linked datasets could then be used in case-control studies, where mortality status can provide information for case and control assignment and initial survey responses can serve as historical exposure information. A final case-control design feature, however, is the matching of controls to identified cases in these large datasets (Table 2). Propensity score matching (PSM) is a statistical technique that can match individuals based on their predicted probability relative to specified characteristics [23]. Thus, using large national health datasets with identified cases (e.g., death), controls can be selected using PSM so that they are similar to cases on all relevant covariates (e.g., age, sex, etc.) [24]. Therefore, the purpose of this study was to use PSM to investigate the relationship between historical PA and mortality in a large national survey.

Participants and design
Data for this research came from the 2001-2002 National Health and Nutrition Examination Survey (NHANES) along with its associated mortality file [25]. NHANES is a continuous survey created to assess health and nutrition status and behavior in the U.S. population. NHANES data are organized by category: demographics, dietary, examination, laboratory, questionnaire, and limited access. NHANES data are unique because of the physical examination (by health professionals) and clinical laboratory data components. Mortality data are matched to NHANES participants by the National Center for Health Statistics (NCHS) with the most recent mortality follow-up ending this past December 31, 2011. For this study, a total of 1,604 participants who were 18+ years of age, who answered all relevant questions, and who were eligible for mortality linkage were included (Table  3).

Measures
Five different independent variables were used in this study: moderate or vigorous PA (MOVPA), moderate PA only (MPA), vigorous PA only (VPA), moderate and vigorous PA (MAVPA), and sedentary (ST) behavior. The four activity variables (MOVPA, MPA, VPA, and MAVPA) were determined from a series of questions asking respondents if they participated in moderate or vigorous activities [26]. Moderate-intensity activity was assessed by the following question "Over the past 30 days, did you do moderate activities for at least 10 minutes that cause only light sweating or a slight to moderate increase in breathing or heart rate? Some examples are brisk walking, bicycling for pleasure, golf, and dancing." Vigorous-intensity activity was assessed by the following question "Over the past 30 days, did you do any vigorous activities for at least 10 minutes that caused heavy sweating, or large increases in breathing or heart rate? Some examples are running, lap swimming, aerobics classes or fast bicycling." Those respondents answering "yes" to either question above were considered participating in MOVPA. Participants answering "yes" to only the first or only the second question above were considered participating in MPA and VPA, respectively. Finally, participants answering "yes" to both questions above were considered participating in MAVPA. ST behavior was assessed by the following question: "Over the past 30 days, on a typical day how much time altogether did you spend on a typical day sitting and watching TV or videos or using a computer outside of work?" Participants reporting 5+ hours per day were considered sedentary. Three different mortality status variables were used: all-cause (case/control), coronary heart disease (CHD) (case/control), and cancer (case/control). Finally, five covariates were used for PSM: age, sex, race, marital status, and income.

Statistical Analysis
Three separate PSM studies were carried out: all-cause mortality, cancer mortality, and CHD mortality. PSM required four distinct steps.
Step 1 consisted of merging the NHANES survey dataset with the mortality linked file.
Step 2 consisted of removing participants with any missing relevant data.
Step 3 consisted of the creation of the three separate mortality datasets.
Step 4 consisted of the PSM. Controls were propensity score matched to cases in each study using age, sex, race, income, and marital status. SPSS v24 was used for all PSM [27]. Logistic regression was used to model the relationship between PA and mortality and compute odds ratios (ORs) with 95% confidence intervals (CIs). SAS version 9.4 was used for all statistical modelling [28,29]. All significance levels were set to p=.05. Table 1 displays the socio demographic comparison between cases and matched controls across all-cause, CHD, and cancer studies. A total of 512 (1,092) all-cause, 148 (441) cancer, and 86 (256) heart disease cases (controls) were used in the analyses with an average follow-up of 9.2 years. After PSM, cases and controls appeared similar on most socio demographic variables (age, sex, race, and income) across all three studies. With only the exception of age in the all-cause mortality matching, where there was a tendency toward more older (70+ years) cases (52.2%) as compared to controls (33.2%, p<.001). Table 2 displays sample-specific estimates of PA behavior among participants that experienced mortality. PA differences among those that experienced mortality were evident, with only 6.5% of adults dying from all-causes historically reporting MAVPA, as compared to 93.5% (p<.001) not reporting MAVPA. This difference was narrowly consistent across CHD and cancerrelated mortality studies. Similarly, only 14.0% of adults dying from all-causes historically reported VPA, as compared to 86.0% (p<.001) not reporting VPA. As well, these differences were narrowly consistent across CHD and cancer-related mortality studies. In terms of ST behavior, fewer participants dying from all-causes were ST (30.8%) than not (69.2%) (p<.001). More noteworthy, ST behavior was more prevalent among those dying of CHD (37.5%) as compared to those dying of cancer (22.3%). Table 3 displays the odds of all-cause, CHD, and cancer survival among adults historically reporting various forms of PA behavior. Adults reporting MOVPA (OR=1.80, 1.44-2.24), MPA (OR=1.51, 1.21-1.89), VPA (2.35, 1.75-3.16), and MAVPA (2.60, 1.73-3.90) at baseline were more likely to have survived as compared to their less active counterparts. In addition, those assessed as ST were less likely (OR=0.43, 0.33-0.55) to have survived as compared to their less ST counterparts. Similar findings were observed for the CHD study, with the exception that only MOVPA, VPA, and ST behaviours were related to survival. Finally, all PA measures were significantly related to cancer-related survival with exception of ST behavior.

Discussion
The purpose of this study was to first use PSM to individually match control group participants to case (experienced mortality) participants in the NHANES dataset. This part of the study was adequately performed, as seen by a relatively even socio demographic match between cases and controls.PSM has been used successfully by other researchers to aid in creating casecontrol designs in survey data [30][31][32]. An additional purpose Juniper Online Journal of Public Health of this study was to investigate the relationship between historical PA and all-cause, CHD, and cancer-related mortality. Results clearly showed that adults participating in various forms of PA at baseline were more likely to have survived to followup. In the all-cause mortality analysis, all forms of PA as well as ST behavior were related to survival. As well, these findings remained significant after controlling for confounding variables. The strongest survival relationship appeared related to MAVPA. This finding is consistent with other research. One recent study among Swedish adults used accelerometer-measured PA to examine its relationship with all-cause, CHD, and cancer mortality [33]. Results of this study showed an inverse relationship with time spent in moderate-to-vigorous PA and survival time, across all three mortality scenarios. Additionally, results of this study found that the most ST adults were at greatest risk for mortality, in all three mortality scenarios.
In the CHD mortality analysis, VPA was the strongest predictor of survival, with MPA showing unrelated to survival. Data supporting this specific finding are sparse. However, one study among older men examined the relationship between self-reported PA and CHD events [34]. This study found an inverse relationship between PA intensity (occasional PA, light PA, moderate PA, and vigorous groups) and risk of CHD event, with the vigorous-intensity group experiencing the lowest risk of these events. Finally, in the cancer mortality analysis, all forms of PA behavior were associated with survival, except for ST behavior. Albeit, ST was a suggestive predictor of survival, it was not strong enough to detect significance. This finding is consistent with other research, where PA measures have been found associated with cancer mortality despite a lack of association with ST behavior [35].
This study has limitations worth discussing. The first limitation is the use of self-reported PA behavior at baseline, as opposed to the use of a more objective method (e.g., accelerometers). In self-reported PA scenarios, participants must recall their activity, allowing for inaccurate recollection as well as inaccurate assessment of duration and intensity. In fact, many studies have shown the overestimation of PA and underestimation of ST behavior when assessed by questionnaire [36][37][38]. However, the current study used a lower-bound criteria for PA, where adults were considered active (or not active) at a given intensity merely by acknowledging (or not acknowledging) a 10-minute minimum participation at the given intensity. Therefore, the current study was not concerned with measuring specific PA amounts (i.e., min/day) at specific intensities. Thus, the bias in our findings should be minimal in comparison to other studies that use self-reported PA measures.
Another limitation worth mentioning is the use of historical PA as main exposure variables (risk factors) in this study. Specifically, this study had no means of assessing changes in PA across the observational period. Therefore, it is possible that some adults in the sample were physically active (or inactive) at baseline assessment and then became inactive (or active) before the end of mortality follow-up. This inability to assess changes in participant PA may have caused a certain amount of un measurable exposure misclassification. Consequently, the findings in this study may suffer from a certain amount of bias and should then be interpreted with caution.

Conclusion
Results from this study support the use of PSM in developing a case-control study with large mortality linked data. Furthermore, moderate and vigorous PA and less sedentary behavior were related to increased longevity in U.S. adults. Additionally, vigorous PA and less sedentary behavior were related to CHD survival whereas all forms of PA behavior except sedentariness were related to cancer survival. Health promotion specialists should consider different types of PA behavior when planning interventions to increase longevity in adults.