The aim of this study was to investigate how different combinations of sample size and repeated observations within individuals influences SEM, and thus the ability to estimate the uncertainty of the population estimate of physical activity by confidence intervals or by detecting group differences with, for example, a t-test. This was done with the help of four different subsets mimicking different accelerometer-based measurement protocols: using all (21 – 28) valid days, seven random days of measurement, the first seven days of measurement, and three random days of measurement from the first seven days. The result shows that, to get as low an SEM as possible it’s more efficient in terms of total number of observed days to maximize the number of subjects and to keep the number of repeated observations within each subject close to one. This is the same conclusion as Lee, P.H. (2018) reached when they investigated how, given a fixed amount of accelerometers, varying the number of days and subjects influenced on the number days needed within the ICC framework, i.e. level 3 related questions (28). It appears as if there has been a long-lasting misconception that there is a ''one size fits all'' protocol when it comes to physical activity measurement using accelerometers. However, as this study shows, there is no one protocol for accelerometer-based physical activity that is suitable for all possible research questions. The current standard protocol of seven days of repeated measurement within each individual can most likely be traced back to one review, which is currently cited over one thousand times (4). The authors suggested that to measure physical activity in adults and children, 3-5 and 4-9 days, respectively of monitoring was appropriate. They also stated that:
For investigators, the goal is to monitor activity for a sufficient number of days so that the resulting daily average reflects an individual's usual or habitual level of physical activity (4).
This statement holds true only if the research question is at level 4, identifying the habitual level of physical activity in an individual. This is not as important for studies at level 1-2 since the between subject variation will cancel each other out, e.g. some will be more active compared to their average and some will be less active, and the group level estimate will be valid if the sample size is sufficiently large (29). In addition, the studies included in that review were in fact all attempting to determine the number of repeated days of measurement that was needed to rank individuals according to their level of physical activity, i.e. level 3. This illustrates the misconception regarding what it means to assess habitual physical activity of an individual. In other words, there is a difference between what is needed to describe the habitual physical activity of one person (n=1) and to be able to rank this individual correctly according to their level of physical activity in a group of individuals (n>1). The latter situation (level 3) is the one that most of the previous research has dealt with and the difference between the two can be illustrated as follows. To answer questions in level three situations one first calculates an ICC (or other appropriate measure such as in the case for G-theory). As a second step, one takes that and enters it into the Spearman-Brown prophecy formula to generalize into the number of measurements (days) that is needed to, with a desired reliability, be able to correctly rank individuals according to their level of physical activity. The Spearman-Brown prophecy formula is used to estimate the number of repeated observations needed to rank individuals to a desired level of reliability according to:
[Please see the supplementary files section to view the equation.] (2)
In which ICCd is the desired reliability (e.g. 0.8) and ICCs is the observed reliability in the group. The outcome of that calculation based on the current sample can be found in the supplementary file.
This procedure will not generalize to estimate the number of observations needed to identify the habitual physical activity of one single individual.
This becomes obvious when looking at the formulas. Consider the following situation. The ICC is calculated (depending on which ICC in the larger family of ICCs) as for example where is the between-subject variation and is the within-subject variation. If = 100 and = 25 then the ICC = 0.8. However, if = 10 and = 2.5 then the ICC is still 0.8 even if the within-subject variation differs by a factor of 10.
In both situations the ICC becomes identical, and by extension so will the outcome from Spearman-Brown prophecy formula. However, the within and between subject variations differ by a factor of 10 between the situations, which will certainly change the ability to identify the habitual physical activity of an individual. The habitual level of physical activity can be defined, slightly modified from Lui et al's version for diet, as: ''the hypothetical average around which that individual's physical activity varies'' (30). To estimate the habitual level of an individual it is therefore necessary to estimate the within subject coefficient of variation. That is, how much does an individual fluctuate around their true, but unmeasured, mean level of physical activity. That value is then entered in equation 3
[Please see the supplementary files section to view the equation.] (3)
In which D is the number of days needed to monitor. Zα is the normal deviate for which the percentage of time the measured value should fall within a specified limit (i.e 1.96 = 95% confidence). CVw is the within-subject coefficient of variation, and D0 is how close to the ‘’true’’ habitual level the observed value should fall (e.g. 20%). The outcome from such an analysis is interpreted as the number of repeated observations needed so that the observed value is within ±20% of the true habitual level 95 % of the time.
We have previously published work that estimated how many days are needed to estimate the usual physical activity of an individual (level 4). We showed that, for most intensity levels, considerably more days are needed than for research questions on level 3 (22).
Thus, if the results of the present study are combined with previous studies in this field (5-19, 22), a more nuanced picture than a ''one size fits all'' emerges when it comes to protocols for accelerometer-based physical activity assessment. Depending on which research question is to be answered there are several decisions that the researcher needs to make, including about the number of subjects to be included in the study and the number of repeated observations within each of the included subjects. However, the optimal accelerometry-based assessment protocol will also have to factor in other considerations, such as the subject burden of wearing accelerometers, the costs of including either more subjects or more days per subject and so on. The researcher must determine the best protocol given all of these different circumstances. The present study together with the other studies in the field should make it easier for a researcher to make informed decisions regarding these questions.
The study population could be viewed as a potential limitation to the results as they were not selected at random and were more active compared to the general Swedish population (31). However, the outcome of the study is independent of the studied population or their level of physical activity. The same conclusion, that it is more efficient to reduce the SEM by including more subjects than increasing repeated observation within each subject, would have been reached if we would have simulated the data from scratch. However, the level of the SEM will change in different populations and it is therefore important to make power calculations with high quality information on the relevant population at hand before designing a study.
Another issue that goes beyond the scope of this study is the influence of other sources of variance that effects on the outcome such as day of the week effect (7), seasonal variations (32, 33), even if this variation may be trivial on group level (34), to physical activity or other sources related to for example gender and age distribution in the population. To get population estimates of physical activity levels within for example a national monitoring system it’s important to consider these factors and not only days versus subjects when choosing an appropriate protocol and when sampling the study participants from the target population, and the best way to do so may to use a simple single sample selection procedure and not to force any combination (35).