We used data from two complementary studies with samples designed to be representative of the non-institutionalized adult population in the United States. The primary study, the Tufts Equity in Health, Wealth, and Civic Engagement (TES; n = 1449 in 2021; n = 1831 in 2022), provides rich data on many different household exposures and COVID-19 outcomes. The secondary study, the Household Pulse Survey (HPS; n = 147,380 in 2021; n = 62,826 in 2022), provided a very large data set with which to examine household size and COVID-19 outcomes. Our study was determined to be exempt by the Tufts University Social, Behavioral, and Educational Research Institutional Review Board (protocol STUDY00000428; for analysis of TES data) and Not Human Subjects Research by the Tufts University Health Sciences Institutional Review Board (protocol STUDY00001768; for analysis of HPS data).
Tufts Equity in Health, Wealth, and Civic Engagement Study (TES): Data Collection and Study Sample
Methods for the TES have been published previously [25]. Briefly, the survey for this study was administered through the Ipsos Public Affairs KnowledgePanel. Initial recruitment into the panel was based on probability-based techniques. Ipsos also uses stratified random sampling methods to maximize geodemographic representativeness of the target US adult population. Once individuals are recruited into the panel, they complete a Core Profile Survey. Subsequently, they become eligible for separate surveys, such as the TES survey. For the present analysis, we used data from 1449 randomly selected KnowledgePanel individuals participating in the second survey wave (April 23-May 3, 2021; 69% completion rate) and 1831 individuals participating in the third survey wave of this study (May 26-June 2, 2022; 66% completion rate). Of these individuals, 1007 participated in both survey waves two (2021) and three (2022). In survey wave three, there were 760 participants who had not participated in a previous survey wave who were selected as part of the study team’s effort to over-sample Hispanic, non-Hispanic African American/Black, and non-Hispanic Asian populations. These participants were randomly selected from among the Hispanic, non-Hispanic African American/Black, and non-Hispanic Asian KnowledgePanel participants.
TES: Housing Characteristics
Participants self-reported household characteristics including household size (1–2 people/3–4 people/5 or more people), whether they had ever noticed visible water damage, visible mold, musty smells, or moldy smells (yes/no), how often someone (they or others) smoke inside the home (not at all/several days or more frequently in a typical week), the year their housing unit was originally built (asked in 2022 only; 1977 and earlier/1978 and later), whether in the last summer their housing unit was so hot for 24 hours or more that they were uncomfortable (2022 only), whether in the last winter their housing unit was so cold for 24 hours or more that they were uncomfortable (2022 only), and use of candles in the home (2022 only). Additionally, participants stated whether any of the following aspects of their housing negatively affected their health: indoor air quality; indoor temperature; water quality; pests like insects, rodents, or other vermin; physical condition of housing – state of repair; lack of privacy; household members who are exposed to COVID-19 at work or school (2021 only); outdoor air pollution; nighttime noise from road traffic; nighttime noise from aircraft or railways; neighborhood crime or drug activity; lead paint (2022 only), cost of housing (2022 only), noise from industrial activity or construction (2022 only), or none of the above. In 2021 only, participants reported whether they regularly used any of the following heating sources during cold weather months: central gas heating, gas, or oil-fired furnace, or gas or oil-fired boiler; air-source or ground-source heat pump; gas wall heater, space heater, or freestanding combustion heater such as gas or kerosene; electric wall heater, space heater, or freestanding electric heater; wood stove or a gas or wood fireplace; none of the above; or don’t know. Similarly, in 2021, participants reported whether they typically used any of the following cooling sources: central air conditioning; window/wall air conditioning unit(s); fans; open windows; central or room humidifier; evaporative cooling systems; central high efficiency particulate air (HEPA) or electrostatic filter; none of the above; or don’t know. We treated the heating sources question as a proxy for exposure to combustion sources and the cooling sources question as a proxy for ventilation.
TES: COVID-19 Experiences
We created three dichotomous COVID-19 outcome variables: COVID-19 probable or actual diagnosis, COVID-19 diagnosis (2021 only), and COVID-19 vaccination status.
In the 2021 survey, participants were considered to have had a COVID-19 probable or actual diagnosis if they responded “yes” to either of two questions: “Have you ever tested positive for COVID-19?” or “Although you did not receive a positive test for COVID-19, do you believe you have ever had COVID-19?” The latter question was included due to difficulty in obtaining COVID-19 diagnostic tests early in the pandemic. Participants were considered not to have had a COVID-19 probable or actual diagnosis if they answered “no” to both questions or if they answered “no” to one question and had missing data for the other question. In 2022, participants were categorized as having had a COVID-19 probable or actual diagnosis based on their response to “Do you believe that you have had COVID-19?” (yes/no). Participants who answered “don’t know” were excluded from this analysis.
Participants were considered to have a COVID-19 diagnosis in the 2021 survey if they responded affirmatively to the question “Have you ever tested positive for COVID-19?” and were considered to not have had a COVID-19 diagnosis if they responded “no” to this question.
Participants were considered vaccinated or definitely willing to receive a COVID-19 vaccine in the 2021 survey if they either responded “Yes” to “Have you ever received a COVID-19 vaccine?” or “Very likely” to “How likely is it that you will get the vaccine when you are eligible?” (only asked among respondents who responded “No” or “Unsure” to “Have you ever received a COVID-19 vaccine?”). Participants who said “No” or “Unsure” to “Have you ever received a COVID-19 vaccine?” and who said they were “Somewhat likely,” “Not sure,” “Somewhat unlikely,” or “Very unlikely” to get a vaccine when they were eligible were considered unvaccinated against COVID-19. Participants were considered vaccinated in the 2022 survey if they reported that they had received one or more doses of a COVID-19 vaccine. Participants who reported that they received zero doses were considered unvaccinated. The 2022 definition differed from the 2021 definition due to the fact that the whole population was not yet eligible to get vaccinated at the time of the 2021 TES survey.
TES: Covariates
Covariate data for participants in the TES were obtained through self-reported survey responses and through linkage with other data sets. Self-reported covariates included gender (female/male), age (continuous), race/ethnicity (Hispanic/non-Hispanic Black/non-Hispanic White/non-Hispanic other or at least two races), educational attainment (no high school diploma or equivalent/high school graduate or equivalent/some college or Associate’s degree/Bachelor’s degree/Master’s degree or higher), and annual household income (<$25,000/$25,000-$49,999/$50,000-$74,999/$75,000-$99,999/≥$100,000). Based on participants’ self-reported residential ZIP Codes, we used American Community Survey 5-year (2016–2020) estimates to generate median annual household income in 2020 inflation-adjusted dollars (S1901 from the US Census Bureau; <$25,000/$25,000-$49,999/$50,000-$74,999/≥$75,000) [26]. Based on participants’ county of residence, we assessed three additional variables – population density, air pollution exposure, and greenness exposure. Population density was derived using American Community Survey 5-year (2014–2018) estimates in B25010 and B01001 of the US Census Bureau data (< 100/100–499/500–999/1000–1999/≥2000 people per square mile) [27,28]. Air pollution exposure was assessed by the University of Washington in Saint Louis Atmospheric Composition Analysis Group ( https://sites.wustl.edu/acag/datasets/surface-pm2-5/ ) as the 2018 annual average fine particulate matter (PM2.5; particles < 2.5 µm in aerodynamic diameter) based on publicly available North American-specific models derived by combining satellite (aerosol optical depth; Terra and Aqua satellites) and ground-monitoring data [29,30]. Greenness exposure was assessed using 16-day composites of normalized difference vegetation index (NDVI; an indicator of photosynthetic activity in plants that ranges from − 1 indicating water to 1 indicating dense green forests) derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor at 250–m x 250–m resolution onboard the Terra satellite (mean across the county of pixel-level non-negative maximum NDVI from April-September 2018) [31]. We dichotomized greenness exposure as high (NDVI > 0.6) or low (NDVI ≤ 0.6) [32].
Household Pulse Survey (HPS): Data Collection and Study Sample
The HPS is a large, nationally representative household survey conducted by the US Census Bureau in partnership with other federal agencies [33]. Participants were selected using the Census Bureau’s Master Address File January 2021 update (for 2021 data) or January 2022 (for 2022 data), a database with approximately 145 million housing units. Of these housing units, 81% had at least one email address or cell phone number linked in 2021 and 90% had at least one email address or cell phone number linked in 2022.[34] The present analysis uses data from two merged waves of Phase 3.1 (2021) and one wave from Phase 3.5 (2022). Details of the systematic sampling strategy and weighting procedures are provided elsewhere [34,35]. Briefly, we used data from 68,913 participants who responded to the Qualtrics survey between April 14 and April 26, 2021 (wave one; 6.6% response rate), 78,467 participants who responded to the survey between April 28 and May 10, 2021 (wave two; 7.4% response rate), and 62,826 participants who responded to the survey between June 1 and June 13, 2022 (6.2% response rate) [34,35]. These survey waves were selected since they were the closest temporally to the data collection time periods for the TES. We used the HPS to validate findings from the TES, as the HPS has a much larger sample.
HPS: Housing Characteristics
Participants in the HPS were asked to indicate a number for: “How many total people – adults and children – currently live in your household, including yourself?” We categorized this variable with the levels of 1–2 people, 3–4 people, and five or more people.
HPS: COVID-19 Experiences
We used two COVID-19 outcomes – COVID-19 diagnosis and COVID-19 vaccination status.
For COVID-19 diagnosis in 2021, participants were asked: “Has a doctor or other health care provider ever told you that you have COVID-19?” For the COVID-19 diagnosis in 2022, participants were asked: “Have you ever tested (using a rapid point-of-care test, self-test, or laboratory test) positive for COVID-19 or been told by a doctor or other health care provider that you have or had COVID-19?” We included participants who responded “yes” and “no” in these analyses.
For COVID-19 vaccination status in 2021, participants who indicated “yes” to “Have you received a COVID-19 vaccine?” or “definitely get a vaccine” to “Once a vaccine to prevent COVID-19 is available to you, would you…” were considered vaccinated or definitely willing to vaccinate. Participants who answered “no” to “Have you received a COVID-19 vaccine?” and any of “probably get a vaccine,” “be unsure about getting a vaccine,” “probably NOT get a vaccine,” or “definitely NOT get a vaccine” to “Once a vaccine to prevent COVID-19 is available to you, would you…” were considered unvaccinated. For COVID-19 vaccination status in 2022, we categorized people based on their response to “Have you received a COVID-19 vaccine?” (yes/no).
HPS: Covariates
Self-reported covariate data for participants in the HPS included gender (female/male), age (continuous), race/ethnicity (Hispanic/non-Hispanic Black/non-Hispanic White/non-Hispanic other or at least two races), educational attainment (no high school diploma or equivalent/high school graduate or equivalent/some college or Associate’s degree/Bachelor’s degree/above a Bachelor’s degree), annual household income (<$25000/$25,000-$49,999/$50,000-$74,999/$75,000-$99,999/≥$100,000), and residence in a metropolitan statistical area (yes/no for any of: Atlanta-Sandy Springs-Alpharetta (GA), Boston-Cambridge-Newton (MA-NH), Chicago-Naperville-Elgin (IL-IN), Dallas-Fort Worth-Arlington (TX), Detroit-Warren-Dearborn (MI), Houston-The Woodlands-Sugar Land (TX), Los Angeles-Long Beach-Anaheim (CA), Miami-Fort Lauderdale-Pompano Beach (FL), New York-Newark-Jersey City (NY-NJ), Philadelphia-Camden-Wilmington (PA-NJ-DE-MD), Phoenix-Mesa-Chandler (AZ), Riverside-San Bernardino-Ontario (CA), San Francisco-Oakland-Berkeley (CA), Seattle-Tacoma-Bellevue (WA), or Washington-Arlington-Alexandria (DC-VA-MD-WV)).
Statistical Analyses and Conceptual Model
All analyses were conducted in Stata/SE v17 and employed survey weighting to maximize the representativeness of the sample. All analyses for the two studies (TES and HPS) and the two years (2021 and 2022) were conducted separately to capture differences in housing conditions over time, and differences in exposure-outcome associations over time. For each study, we first examined descriptive characteristics of the samples. We calculated weighted counts and proportions for all categorical variables, and we calculated means and 95% confidence intervals (95% CIs) for all continuous variables. Next, for the TES, we examined correlations between heating sources and cooling sources. We created new exposure variables for pairs of heating and cooling sources that were significantly associated with each other (Pearson’s chi-squared p-value < 0.05) and that had at least 25 people (based on weighted counts) who reported that they used both heating and cooling source in the pair. Then, we estimated logistic regression models for the cross-sectional associations between each exposure variable and each outcome variable (separate models).
To estimate associations between each household exposure (other than household size) and each outcome using data from the TES, we fit unadjusted models and two sets of adjusted models. The primary models were adjusted for covariates determined using an evidence-based directed acyclic graph (DAG) and included age, race/ethnicity, annual household income, county-level population density, county-level annual air pollution, and county-level greenness (Supplemental Fig. 1). As a sensitivity analysis, the second set of models were additionally adjusted for gender, educational attainment, and ZIP Code level median household income since these variables were identified through the DAG development process as related to the exposure and outcome (though not as part of the minimally sufficient adjustment set). Please refer to the supplement for details of the DAG development, along with the DAG code and justification for each arrow.
To estimate associations between household size and the outcomes, we fit unadjusted and adjusted models (separate models for each outcome and study). The set of covariates in the adjustment set was determined based on the DAG (see Supplemental Text and Supplemental Fig. 1 for details). For models using the TES data, covariates included age, gender, race/ethnicity, educational attainment, annual household income, ZIP Code level median household income, and county-level population density. The adjustment set for the HPS was similar, except that we included residence in a metropolitan statistical area instead of county-level population density and we did not include ZIP Code level median household income (based on lack of available ZIP Code data).