Long COVID and Symptom Trajectory in a Representative Sample of Americans

People who have COVID-19 can experience symptoms for months. Studies on long COVID in the population lack representative samples and longitudinal data focusing on new-onset symptoms occurring with COVID while accounting for pre-infection symptoms. We use a sample representing the U.S. community population from the Understanding America Study COVID-19 Survey, which surveyed around 8,000 respondents bi-weekly from March 2020 to March 2021. Our final sample includes 308 infected individuals who were interviewed one month before, around the time of, and 12 weeks after infection. About 23% of the sample experienced new-onset symptoms during infection which lasted for more than 12 weeks, and thus can be considered as having long COVID. The most common persistent new-onset symptoms among those included in the study were headache (22%), runny or stuffy nose (19%), abdominal discomfort (18%), fatigue (17%), and diarrhea (13%). Long COVID was more likely among obese individuals (OR = 5.44, p < 0.001) and those who experienced hair loss (OR = 6.94, p < 0.05), headache (OR = 3.37, p < 0.05), and sore throat (OR = 3.56, p < 0.05) during infection. Risk was unrelated to age, gender, race/ethnicity, education, current smoking status, or comorbid chronic conditions. This work provides national estimates of long COVID in a representative sample after accounting for pre-infection symptoms.


Introduction
While SARS-CoV-2 is usually thought of as an acute disease, we now know that some people with COVID experience a variety of post-acute health problems long after their disease onset. This experience of long-term persistent symptoms has been termed "long COVID" [1,2]. Acute COVID typically lasts 3 weeks [2][3][4], but long COVID can last weeks, months, or longer [5]. Studies have reported widely varying prevalence levels of long COVID, ranging from 10% to over 90% (see Supplemental Table 1 for a summary of studies). Limitations in study design have made it di cult to obtain a clear estimate of the prevalence, symptoms of, and associated risk factors for long COVID in the United States population.
Early evidence of long COVID was based on discharged hospitalized COVID patients in a number of countries. Since these re ected patients with worse disease outcomes than average, the estimated prevalence of long COVID was generally high, ranging from 50-90% [6][7][8][9][10][11]. However, hospitalized patients account for a very small proportion (about 5%) of COVID-19 cases [12]; so, focusing only on samples of the discharged hospitalized patients provides a limited perspective on the experience of long COVID in the broader population. Studies using samples combining hospitalized and non-hospitalized individuals from speci c geographic regions generally reported lower prevalence of long COVID compared to those focusing only on hospitalized patients, mostly ranging from 30-70% [1,[13][14][15][16][17][18][19][20]. Notably, the studies including a greater proportion of hospitalized patients, surveying more symptoms, and focusing on a shorter time frame generally report higher estimated long COVID prevalence. However, these estimates are not population representative.
Only two population representative surveys have been conducted to-date. In December 2020, the O ce for National Statistics (ONS) estimated the prevalence of long COVID in the U.K. from a survey of 8,193 non-hospitalized and non-institutionalized respondents who ever tested positive for COVID during the survey follow-up − 21% exhibited symptoms lasting longer than 5 weeks, and 10% exhibited symptoms lasting longer than 12 weeks [21]. In June 2021, using data from the Real-time Assessment of Community Transmission-2 (REACT-2) Study, a nationally representative sample of the community population in England,

Page 3/13
Data We used data from the Understanding America Study (UAS) COVID-19 National Sample, an ongoing longitudinal national probability-based internet panel of approximately 9,000 non-institutional U.S. adults administered by the Center for Economic and Social Research (CESR), at the University of Southern California (USC). Respondents are recruited using a random selection of households from a postal service list of addresses, covering all households in the United States. They answer the survey using a computer, tablet, or smartphone; respondents were provided with a tablet and broadband internet if they did not have access to internet [30]. The UAS started administrating the longitudinal COVID-19 national survey to its panel members in March 2020 [31]. Follow-up surveys were elded every two weeks beginning April 1, 2020. Over each 2-week survey period, one-fourteenth of the respondent pool was asked each day to ll out the survey within 2 weeks. More than 90% of the responses were completed in two weeks for each wave, and most were completed on the day of assignment. The UAS COVID national survey data are weighted to be nationally representative.
The current study used the rst 25 waves of the survey, which were collected from March 10, 2020 to March 31, 2021. During this period, 8,425 respondents participated in the survey, and 872 people (~ 10% of the total participants) reported that they were diagnosed with, or tested positive for, COVID. We limited the analytic sample to 310 respondents who had COVID during the study period who also had information on self-reported symptoms at three times: 4 weeks before reporting a COVID diagnosis/positive test, at the time of the report of a COVID diagnosis/positive test, and 12 weeks after. After excluding 2 respondents with missing information on covariates, our nal analytical sample consisted of 308 people. In other words, 564 were dropped due to missing data. We believe that the missingness is mostly random, since the characteristics are not signi cantly different at 95% con dent level between our nal sample and those we dropped (Supplemental Table 2). The only statistically signi cant difference is that those who had missing data are more likely to be Non-Hispanic others (p = 0.031). So, we were able to examine long COVID among a sample representing the majority of people who got COVID.

Measures COVID Diagnosis
The COVID diagnosed population was determined based on questions about both COVID tests and diagnosis. In the early period of the survey, tests were less available than in the later part of the period. Participants were asked: "Have you been tested for coronavirus since the last time you took our coronavirus survey? If so, what was the result?" and "Whether or not you have had a coronavirus test, has a doctor or another healthcare professional diagnosed you as having or probably having the coronavirus since the last time you took our coronavirus survey?". We considered a respondent as having COVID who either tested positive to SARS-CoV-2 or who was diagnosed with COVID by a healthcare professional.

Self-Reported Symptoms
There is no o cial list of clinical symptoms de ning long COVID. We examined the 18 self-reported symptoms included in the survey. At each wave, respondents were asked to report whether they had experienced the following symptoms in the past 7 days: (1) fever or chills, (2) runny or stuffy nose, (3) chest congestion, (4) cough, (5) sore throat, (6) sneezing, (7) muscle or body aches, (8) headaches, (9) fatigue or tiredness, (10) shortness of breath, (11) abdominal discomfort, (12) vomiting, (13) hair loss, (14) dry skin, (15) body temperature higher than 100.4 F or 38.0 C, (16) diarrhea, (17) lost sense of smell, (18) skin rash. The response options included "Yes", "No", and "Unsure". We treated "Unsure" as not reporting the symptom. A symptom count variable ranging from 0 to 18 was then generated for each respondent at each wave by adding up the number of symptoms reported at a speci c wave.

Long COVID
We de ne long COVID as symptoms reported by the respondent at the time of diagnosis, that were not present 4 weeks prior, and that lasted 12 weeks or more. If a respondent has any of the 18 symptoms at the time of diagnosis, did not experience the symptom 4 weeks before COVID, and reported the symptom 12 weeks after, the respondent is considered as having long COVID. This de nition distinguishes symptoms most likely caused by COVID from symptoms the respondent was already experiencing prior to infection.

Existing Health Conditions
Because underlying medical conditions have been linked to elevated risk for severe illness from COVID-19 [32], we examine whether existing health conditions are associated with increased risk of long COVID. Participants were asked "Have you ever been told by a doctor, nurse, or other health professional that you have any of the following medical conditions?": (1) diabetes, (2) cancer (other than skin cancer), (3) heart disease, (4) high blood pressure, (5) asthma, (6) chronic lung disease such as COPD or emphysema, (7) kidney disease, (8) autoimmune disorder such as rheumatoid arthritis or Crohn's Disease, and (9) obesity. Each condition was treated as a binary variable in the analyses.

Other Covariates
Other covariates included age, gender, race/ethnicity, education level, and current smoking status. Age was categorized into three groups: ages 18 to 49, ages 50 to 64, and ages 65 and above. Race/ethnic groups included non-Hispanic White, non-Hispanic Black, Hispanic, and others. Education was classi ed as high school or less, some college education without a bachelor's degree and a bachelor's degree or more.

Statistical Analysis
We treated the survey wave when the respondents reported that they tested positive for, or were diagnosed with, COVID as the time of infection. By design, the survey interval was 2 weeks. So, two waves (roughly 4 weeks) prior to the infection were the pre-infection time, and 6 waves (roughly 12-13 weeks) after infection was the post-infection stage.
We rst summarized the sample characteristics at the time of reported infection and compared them to the pro le of the COVID-infected population in the United States provided by the Centers for Disease Control [33] in order to assess the generalizability of the survey results. We also compared the characteristics of the long COVID group to those who experienced COVID but not long COVID.
Next, we estimated the prevalence of long COVID using both the reported current symptoms, to compare with estimates from previous studies, as well as new symptom onset (accounting for pre-infection symptoms). We compared our estimates to other nationally representative studies.
Then, we compared the proportions of our analytical sample who reported each of the symptoms at pre-infection, infection, and post-infection stages to show the overall recovery of the infected population. And after that we further compared the proportions of those with long COVID in our sample who reported each of the symptoms at each stage to show how their symptoms changed over time, and the frequency of reported symptoms. Based on our long COVID de nition, we also showed the ranking of persistent new-onset symptoms among those with long COVID.
Lastly, logistic regression models were used to identify sociodemographic and health-related risk factors associated with long COVID: smoking, existing health conditions, and new-onset symptoms at the time of infection.
All results were weighted to be nationally representative, adjusting for differential sampling probabilities and survey nonresponse. All analyses were performed using STATA Version 16.0. Informed consent was obtained during the response from participants. The current analysis used the STROBE (STrengthening the Reporting of OBservational studies in Epidemiology) cohort reporting guidelines.

Results
Sample Characteristics at the Time of Infection Table 1 shows sample characteristics at the time of reported SARS-CoV-2 infection. Our nal sample had a mean age of 46 (third column of Table 1); More than half of the sample was female (57%); 61% was non-Hispanic White; 12% were non-Hispanic Black; and 22% were Hispanic. Both our nal sample and the UAS sample of all COVID cases (the second column of Table 1) are very similar in age, gender, and racial/ethnic distribution to COVID cases tracked by the CDC during the same timeframe? [33] (the rst column of Table 1). p < 0.05 ** p < 0.01 *** p < 0.001 The CDC COVID cases information were from CDC COVID Data Tracker: https://covid.cdc.gov/covid-data-tracker/#demographics For the UAS sample, the time of COVID diagnosis is considered baseline, and all percentages and means are weighted to be nationally representative.
In the nal sample, about 41% had high school or less than high school completion, 35% reported some college, and 24% had a bachelor's degree or higher. Almost 30% of the respondents were current smokers. In terms of existing health conditions, 18% had diabetes, 5% had cancer, 9% had heart disease, 29% had high blood pressure, 19% had asthma, 5% had chronic lung disease, 4% had kidney disease, 5% had an autoimmune disorder, and 24% were obese. Half of the sample had none of the underlying conditions.
At the time of infection, 80% of the respondents were symptomatic, and the average symptom count was 6. Both the proportion symptomatic and the average symptom count were fairly similar between the UAS total COVID sample (the second column of Table 1) and our nal analytical sample (the third column of Table 1). Persons with long COVID had more symptoms on average than those who recovered quickly (7.9 vs 5.4).
Compared to people who did not experience long COVID, the long haulers were signi cantly more likely to be obese (p = 0.004). In terms of new-onset symptoms, the long haulers were more likely to experience headache (0.004), fever (0.037), and runny or stuffy nose (0.034).  [ Figure 1 about here]

Symptoms Trend and Most Reported Symptoms
Similarly, Fig. 2 shows the proportions reporting each of the symptoms at the three stages, but only among the COVID long haulers (n = 74). For many symptoms, the proportions peaked at the time of infection and then dropped but remained higher at the post-infection stage compared to pre-infection stage.
Speci cally, relative to the pre-infection level, the proportion reporting abdominal discomfort (p = 0.004), sore throat (p = 0.048), loss of smell (p < 0.001), and having a body temperature higher than 100.4°F (p < 0.001) were statistically signi cantly higher at the post-infection stage. Among those with long COVID, the most commonly reported symptoms at the post-infection stage included fatigue (50%), dry skin (46%), runny or stuffy nose (39%), headache (38%), and sneezing (35%). It is important to note that these most reported symptoms did not account for pre-infection baseline level, and it is possible that they were commonly reported partially because they had high prevalence even without SARS-CoV-2 infection.
[ Figure 2 about here] To account for the pre-infection prevalence of the symptoms, Fig. 3 shows the prevalence of only persistent new-onset symptoms among those with long COVID at the post-infection stage. Because the long haulers started experiencing these symptoms at the time of infection, they are more likely to be related to COVID speci cally. The most reported persistent new-onset symptoms were headache (22%), runny or stuffy nose (19%), abdominal discomfort (18%), fatigue (17%), and diarrhea (13%). In terms of both the ranking and the prevalence, many symptoms in Fig. 3 are different from Fig. 2 [ Figure 3 about here] Predictors of Long COVID Table 2 shows the logistic regression model predicting long COVID among our sample of 308 SARS-CoV-2 infected respondents. People who were obese (OR

Main Findings
Our results indicate that the estimated prevalence of long COVID in a population representative sample differs depending on whether pre-infection symptoms are accounted for. In the U.S. population, most people with COVID return to their pre-infection symptom level after the acute phase of the disease. However, more than one-fth (23%) experience long COVID, with at least one symptom originating around the time of SARS-CoV-2 infection lasting for more than 12 weeks. Without adjusting for pre-infection symptoms, the prevalence is estimated to be 40%, which suggests the potential for a signi cant over-estimation of long COVID in previous studies.
The most frequently experienced persistent new-onset symptoms among those with long COVID include headache (22%), runny or stuffy nose (19%), abdominal discomfort (18%), fatigue (17%), and diarrhea (13%). The fully adjusted logistic regression model indicates that the likelihood of experiencing long COVID is not signi cantly associated with sociodemographic or behavioral factors including age, gender, race/ethnicity, education, current smoking status or the presence of chronic conditions. COVID long haulers are more likely to experience hair loss, headache, and sore throat at the time of infection compared to their counterparts whose symptoms reduce more quickly. Also, those who are obese are at higher risk of experiencing persistent new-onset symptoms.
To our knowledge, this is the rst study that de ned long COVID accounting for pre-infection baseline symptoms. Even before SARS-CoV-2 infection, more than two-fths (44%) of our sample experienced at least one symptom that can be potentially linked to COVID. Also, among the infected, the prevalence of most symptoms returned to the pre-infection level at the post-infection stage. It means that people report many symptoms both before and after COVID that may be due to other conditions. So, while around 40% of the COVID-infected persons have at least one symptom 12 weeks after COVID diagnosis, this may overestimate the prevalence of long COVID if these persons are all classi ed as long COVID. The longitudinal nature of data, from pre-infection to postinfection stage, made it possible to distinguish new onset symptoms from the symptoms that might be experienced by someone without SARS-CoV-2 are different from the ones identi ed by Augustin et al. [16] (anosmia and diarrhea). It is probably because we used new-onset symptoms as the predictors in our regression model, but the previous study was not able to distinguish new-onset symptoms from those started even before SARS-CoV-2 infection.
By limiting our sample to those who had complete data at the pre-infection stage, the time of infection, and post-infection stage, we excluded those who were diagnosed in the last 5 waves of the survey. It means that for the respondents in our sample, the latest possible date of diagnosis was between November 25 to December 23, 2020 (the 19th survey wave). Since COVID vaccines were available for only a small number of health care workers at that time, we do not think the vaccination would change our estimate in any meaningful way, so we did not include information on vaccination status.

Limitations
Our study has limitations. We are likely to have missed some severe COVID cases since they likely would not have answered the survey while suffering from severe illness. Information on hospitalization is not available in the UAS. Since hospitalized COVID patients generally experience moderate or severe disease outcomes, it is reasonable to assume that they are more likely to have missing data in the UAS and to be excluded from our nal sample. This would lead to an underestimate of the prevalence of long COVID. Given that around 5% of the SARS-CoV-2 infected population are hospitalized [12], and since long COVID is highly prevalent (50%-90%) among hospitalized patients [6-11], we believe that the real prevalence at population level may be higher than our estimate of 23%, ranging from 24-26%. Some limitations of our study are due to the nature of the secondary data we use. The UAS COVID National Survey does not have information on brain fog, which is considered to be a long COVID symptom. So, we failed to include the COVID long haulers who suffered from only persistent brain fog. Finally, Our assessment of long COVID is based on self-reports, instead of clinical diagnoses, and we do not have a clear set of clinical indicators of long COVID. However, self-reported symptoms are still valuable for gaining insights into what is happening in the population. but it provides little information on the pathology or mechanism.
With the availability of vaccines and the onset of new variants, the nation has moved into new stages of the pandemic. The vaccinated population has tripled since the last wave of data used in the current study, and by March 2022, more than 65% of the total US population have been fully vaccinated [34]. The Omicron variant, a variant which spreads more easily than the original virus and the Delta variant [35], emerged in the US in December 2021 and by February 2022, almost all the new cases were driven by Omicron lineages [36]. It remains unclear how vaccination affects long COVID [4], and there is limited evidence on whether the Omicron wave has changed what we know about long COVID [37,38]. Nevertheless, long COVID is still a public health concern. More knowledge on its prevalence, persistent symptoms, and risk factors may help healthcare professionals allocate resources and services to help long haulers get back to normal lives.

Declarations Author Contributions
Qiao Wu performed the analyses and wrote the rst draft. All the co-authors (Qiao Wu, Eileen Crimmins, and Jennifer Ailshire) outlined the paper and edited the text.

Data Availability Statement
The survey and data are available from University of Southern California Understanding America Study website: https://uasdata.usc.edu/index.php

Competing Interest Statement
The authors declare no competing interests.

Funding
This analysis is supported by National Institutes on Aging (P30 AG017265). Figure 1 Percent with Self-Reported Symptoms at Pre-Infection, Infection, and Post-Infection Stages, among the COVID-Infected (n=308)

Notes
The pre-infection stage is 4 weeks before the COVID diagnosis or positive test.
The infection stage is the time of COVID diagnosis or positive test.
The post-infection stage is 12 weeks after the COVID diagnosis or positive test.
Symptoms were listed based on the proportion reported at the time of infection.

Figure 2
Percent with Self-Reported Symptoms at Pre-Infection, Infection, and Post-Infection Stages among COVID Long haulers (n=74)

Notes
The pre-infection stage is 4 weeks before the COVID diagnosis or positive test.
The infection stage is the time of COVID diagnosis or positive test.
The post-infection stage is 12 weeks after the COVID diagnosis or positive test.
Symptoms were listed based on the proportion reported at the post-infection stage.
The statistical signi cance of the difference in proportions between pre-infection stage and post-infection stage is indicated for Panel B.

Figure 3
Persistent New-Onset COVID Symptoms among those with Long COVID 12 Weeks after Infection

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryInformation.docx