Test-Retest Stability of Four Common Body Composition Assessments in College Students

The five components of health-related physical fitness are cardio respiratory fitness, muscular strength, muscular endurance, flexibility, and body composition [1]. Although not a performance measure, BC does have strong associations with chronic diseases such as heart disease [2], cancer [3,4], diabetes [5], and stroke [6] as well as injury [7] and all-cause mortality [8]. BC has also been receiving more recent attention across the health sciences because of the increased prevalence of obesity [9,10]. By definition, BC refers to a set of measures that indicate the distribution of fat, lean mass, and minerals in the body [11]. There are many criterion methods available to assess BC (e.g., hydrostatic weighing), but are generally restricted to lab settings [12].


Introduction
The five components of health-related physical fitness are cardio respiratory fitness, muscular strength, muscular endurance, flexibility, and body composition [1]. Although not a performance measure, BC does have strong associations with chronic diseases such as heart disease [2], cancer [3,4], diabetes [5], and stroke [6] as well as injury [7] and all-cause mortality [8]. BC has also been receiving more recent attention across the health sciences because of the increased prevalence of obesity [9,10]. By definition, BC refers to a set of measures that indicate the distribution of fat, lean mass, and minerals in the body [11]. There are many criterion methods available to assess BC (e.g., hydrostatic weighing), but are generally restricted to lab settings [12].
Field-based techniques are the most practical form of BC assessment in generally healthy populations [13]. Among college student populations, field-based BC methods are likely

Journal of Physical Fitness, Medicine & Treatment in Sports
the only option when one is interested in simple pre-program assessment or formative and summative fitness evaluation. Despite a plethora of evidence supporting the accuracy of fieldbased BC assessments, a necessary prerequisite for validity is their ability to measure scores consistently across different time points [14]. Therefore, the purpose of this study was to examine the test-retest stability of four common BC assessments in college students.

Participants and design
A total of 38 participants who signed an IRB approved consent form and had BC measurements taken from each of four methods at two different time points were included in this analysis. A repeated measures design was used with participants assessed on two separate 4 Occasions (in the same week) on four different BC field methods. All methods and procedures for this study were reviewed by the institution's internal review board (IRB).

Body composition measures
Four different BC measures were used in this study. Percent body fat (PBF) by skinfold technique (SF) was measured (%) using the Siri equation, where body density was first measured using the sum of chest, abdomen, and thigh skin folds (for males) or triceps, suprailiac, and thigh skinfolds (for females) [15]. Waist circumference (WC) (cm) was measured similarly for males and females and required an elastic tape placed at the narrowest point between the xyphoid process and umbilicus [15]. Body mass index (BMI) (kg/m 2 ) was measured similarly for males and females and required measuring height (cm) using a wall mounted stadiometer and weight (kg) using an electronic floor scale. Finally, PBF (%) by handheld bioelectrical impedance (HH) was measured using the Omron BF306 handheld device, as described by the manufacturer [16].

Statistical Analysis
Three statistical approaches were used to evaluate stability. First, Pearson's correlation coefficients, paired t-tests, and Cronbach alphas were used to show how consistent each assessment was across trials [17]. Second, Cohen's kappas were used after transforming each variable into quartiles to assess the amount of categorical agreement across trials [18,19]. Third, Bland and Altman plots and limits of agreement (LOA) were constructed to evaluate the spread and pattern of mean differences across trials [20]. All analyses were conducted using SAS version 9.4 [21]. Table 1 contains descriptive statistics and Pearson correlation coefficients for the test-retest study. Mean differences (SD) for SF (%), WC (cm), BMI (kg/m2), and HH (%) were -0.07 (1.52), 0.39 (3.17), -0.03 (0.53), and -0.21 (2.19), respectively. Test-retest correlations were all very strong (rs ≥.95, ps<.001) with non-significant t-tests (ps>.05). Cronbach's alphas were also very strong (αs>.97) and significantly greater than .70 (reliability cutoff) (ps<.05). Table 2 contains the categorical agreement statistics for stability. Weighted kappas were strong for SF, BMI, and HH (K's ≥.92) and moderately strong for WC (Κ=.71). Figure 1 contains the LOA plots for test-retest data on each BC assessment. LOA plots are constructed so the vertical axes represent the difference (i.e., trial 1 -trial 2) between the two BC assessment trials.    Thus, a value located at the zero horizontal line would indicate that the participant received the same BC score on both trials. Alternatively, the farther values are (vertically) from the horizontal line, the more difference there was between the two BC trial scores. For example, a value located at the +2.0 vertical position in the SF LOA plot, would indicate that the individual received a SF value 2.0 percentage point higher on the first trial as compared to the second trial. Overall, none of the four LOA plots showed systematic bias toward a method (i.e., scatter equally distributed above and below the horizontal zero line).

Journal of Physical Fitness, Medicine & Treatment in Sports
Additionally, all LOA plots showed at least 95% of differences within range. WC LOA were clinically large (±6.2 cm) (Table 3). However, after the removal of two WC outliers, WC LOA became reasonable (±3.8 cm) (not shown). Finally, Table 4 contains the standard error of measurement (SEM) values for the four BC assessments. A SEM is similar to other standard errors but is specifically regarding the variability we might expect in an individual's score [22]. Given this, a SEM can be used as a 6 measure of reliability of a single measurement, where the smaller the SEM the greater our confidence in the score. For example, note that the SEM for BMI is 0.26 (kg/m 2 ). This value can be used to form a 95% confidence interval (CI), similar to a prediction interval in regression analysis. Thus, a college student with a BMI measured at 24.0 could be 95% confident that their true BMI is contained in the interval bound by 23.5-24.5 (kg/m 2 ). The X values and CIs are only examples in this table, whereas the SEMs are constants found from the study.

Discussion
The aim of this study was to examine the test-retest stability of four common BC assessments in college students. The results of this study clearly support adequate test-retest stability of these field-based BC assessments in college students. These findings have considerable implications. For example, measurement theory assumes that scores from an assessment are reliable only under particular situations [22]. That is, assessments found reliable in general populations are not necessarily reliable in college students. Many factors common on college campuses can in fact affect an assessment's stability, such as fatigue, practice, subject variability, testing circumstances, and precision of measurement. This study shows that such factors do not impede the stability of common field-based BC assessments in college students. The limitations in a study's design should always be consulted before generalizing its findings.
One such limitation was the specific population in which the sample was drawn. As previously stated, since BC scores are situation specific, results from this study should be considered only for college students attending a rural public university. Therefore, the strong reliability evidence found in this study should not be generalized to other populations. A second limitation was the relatively small sample size. A larger sample might have provided more 7 variability in BC measures and in

Journal of Physical Fitness, Medicine & Treatment in Sports
turn allowed for the inspection of possible patterns and bias in the LOA plots. Nevertheless, it should also be noted that larger samples for a repeated measures design with four different BC tests can take up a lot of time and effort both on the researcher and the participant.

Conclusion
Results of this study provide evidence for acceptable testretest stability of common field-based BC assessments in college students. Practitioners and researchers who assess BC in college students using field-based techniques should be aware that the measurement error attributed by different time points is negligible in this population.