PHRESH is an ongoing study of two low-income and predominantly African American urban communities in Pittsburgh, PA. To assess whether neighborhood-level changes impact residents' health and well-being, diet, exercise, sleep, heart, and cognitive health, we conducted three assessments of the physical and social environment in the two neighborhoods over a period of five years (2012–2017). These data were examined to identify correlates of, and the extent to which neighborhood-level changes, affected obesogenic behaviors such as physical activity, sleep, and heart health. Our neighborhoods are representative of other urban, disadvantaged areas that may benefit from improvements in the environment, which is a promising approach in the fight against chronic diseases. To our knowledge, this is the first study to conduct systematic longitudinal observations of the environment. In this paper, we have provided guidance on sampling, data collection and reliability assessment. In addition, we include results of repeated reliability testing to determine whether this audit tool and its standard set of items have enough stability across time to detect change.
As with any research endeavor, sampling was a critical step. Earlier work had demonstrated that a 25% sample of street segments produced valid estimates of the built environment [52]. When assessing neighborhood-level change, one difficulty is that these changes can modify the underlying street network. Our experience suggests that secondary sources of data may include non-negligible errors potentially due to lags in updating secondary databases. Whenever feasible (e.g. in a compact environment), we recommend careful verification of any available listing of street segments in the neighborhoods to ensure high accuracy. It is equally important to update the street network at each assessment wave to understand the degree of change in the street network. To reflect ongoing changes in the street network, we carefully identified and sampled new street segments at each wave. When sampling new segments, there should be systematic rules in place. For instance, when an entire street segment was demolished, should the replacement come from the same geographic area or be sampled entirely at random? Should a newly bisected street count as two new streets, or as the same street segment from a prior wave? A changing street network also made segment-level panel analysis difficult; instead, it was more reasonable to identify a stable unit of analysis (e.g. a residential buffer of each study participant) for assessing change.
Thorough and consistent training of data collectors at every wave was also important. Ideally, each round of training should employ the same methods and trainer to avoid systematic biases in ratings across waves. During training, it was important to balance classroom learning with ‘live’ practice. In the classroom, the use of visuals (e.g. photographs) to demonstrate each item worked well. Field practice focused on individual sections of the audit tool and presented a variety of observations. It was important to budget enough time to allow data collectors to discuss questions/situations with the trainer. Thus, the training schedule needed to be flexible to allow extra time for such hard-to-assess items. Furthermore, we found field practice to be the most valuable part of training. When selecting individual data collectors, attention to detail was an important individual trait. Early on in PHRESH, we also integrated a community engaged research framework to ensure its longevity and success [41].
Assessment of (inter-rater) reliability or agreement of individual SSA items helped identify which items performed well at a single timepoint, and across time. A majority of SSA items (81%) had high reliability. Low agreement indicated items that were difficult to rate objectively or with a single observation. For example, “amount of litter” or “adults loitering, congregating or hanging out” may vary even over a short window of time (e.g. a few hours or a day). In the case of trash, we re-assessed agreement for a small subset of street segments in the reliability study where two observations were conducted within hours of each other. However, the agreement for trash or litter did not improve. Items with substantial temporal variation may require multiple ratings (> 2) to fully capture more meaningful levels of variation. Certain items (e.g. perceived safety) are inherently subject to interviewer interpretation, and demonstrated lower agreement, as might be expected. Certain neighborhood features were likely easy to miss across an entire block or street segment (e.g. bar on a single window, cigarette butts on the ground; garden bed/planter), or difficult to assess from the outside (e.g. public/communal space, vacant building) as was necessary according to the audit protocol.
Given the study findings, we can suggest the types of items that may be able to capture change. Consistent with previous research, more subjective measures are less reliable than more objective (observable) ones [39]; dichotomous ratings have higher reliability than ordinal response scales (although the greater number of response categories may be valuable for providing finer distinctions). Large, visible items (e.g. buildings, traffic signs) were consistently reliable. While sidewalks are an important feature of the walking environment, sidewalk conditions can change quickly over a city block, making it challenging to rate consistently. Also, rare/low prevalence features (see supplemental Table 1) did not lend themselves well to KA testing. For example, the only Gathering places in the study neighborhoods with prevalence above 5% were churches. If the low prevalence items were readily identified, the PO statistic showed consistency in endorsing their absence.
While some features of the environment may change, there are also features that should be time invariant. Yet, when we compared slope (“flat”, “slight hill”, “steep hill”) across years for a sub-group of street segments with three years of complete data, 22% of the segments had different values although slope is unlikely to change. Also, 10% of street segments were endorsed as having art/monument in 2012, while only 2% of segments had art/monument three years later (2015.) which may point to some confusion over what constitutes art. Therefore, we recommend the use of only those SSA items with consistently good to excellent agreement at every wave to assess change. Future studies may be able to improve the measurement of these less reliable items through particularly detailed and intensive training or procedures (e.g., mapping out a visual area into a grid to more systematically inspect for broken windows), clearer rules and examples for determining whether something is a communal space, or by the addition of a “cannot determine” category to the form. Even subjective ratings might be improved if anchored through training or explicit item instructions (e.g. 1 = a place where you would not feel physically at risk of violence from another person if walking alone in daylight, etc.), and use of multiple raters to reduce individual rater idiosyncracies.
In our knowledge, this article is the first to conduct repeated assessments of the built and social environment to assess change. The PHRESH study’s SSA tool is reliable and practical to implement, requiring an average of 13 minutes per street segment, that trained data collectors found easy to use. While the audit tool provides rich and detailed data on environmental features, it is important to explore which of these features correlate with our health outcomes and whether these relationships are consistent over time. Due to the compact nature of our neighborhoods, we also need to test this audit tool in neighborhoods with greater variation, as certain items exhibited low or zero prevalence in the study neighborhoods. The next steps are to also to develop and to test indices that summarize features of the neighborhoods that may be predictors of the study outcomes. If valid indices of environmental features can be derived, they will be useful in guiding public policy, urban planning and redesign in the creation of built environments that promote health.