Mood disorders (MDs) are a group of diagnoses in the Diagnostic and Statistical Manual 5th edition1 classification system. They are a leading cause of disability worldwide2 with an estimated total economic cost greater than USD 326.2 billion in the United States alone3. They encompass a variety of symptom combinations affecting mood, motor activity, sleep, and cognition and manifest in episodes categorized as major depressive episodes (MDEs), featuring feelings of sadness and loss of interest, or, at the opposite extreme, (hypo)manic episodes (MEs), with increased activity and self-esteem, reduced need for sleep, expansive mood and behavior. As per the DSM-5 nosography, MDEs straddle two nosographic constructs, i.e. Major Depressive Disorder (MDD) and Bipolar Disorder (BD), whereas MEs are the earmark of BD only4.
Clinical trials in psychiatry to this day entirely rely on clinician-administered standardized questionnaires for assessing symptoms severity and, accordingly, setting outcome criteria. With reference to MDs, Hamilton Depression Rating Scale-175 (HDRS) and Young Mania Rating Scale6 (YMRS) are among the most widely used scales to assess depressive and manic symptoms7, quantifying behavioral patterns such as disturbances in mood, sleep, and anomalous motor activity. The low availability of specialized care for MDs, with rising demand straining current capacity8, is a major barrier to this classical symptom monitoring pipeline. This results in long waits for appointments and reduced scope for pre-emptive interventions. Current advances in machine learning (ML)9 and the widespread adoption of increasingly miniaturized and powerful wearable devices offer the opportunity of personal sensing, which could help mitigate the above problems10. This can involve near-continuous and passive collection of data from sensors, with the aim of identifying digital biomarkers associated with mental health symptoms at the individual level, therefore backing up clinical evaluation with objective and measurable physiological data. Personal sensing holds great potential of being translated into clinical decision support systems11 the for the detection and monitoring MDs. Specifically, it could be particularly appealing to automate the prediction of the items of the HDRS and YMRS scales as they correlate with changes in physiological parameters, conveniently measurable with wearable sensors12–14.
However, so far, the typical approach has been to collapse MDs detection to the prediction of a single label, either the disease state or a psychometric scale total score15,16, which risks oversimplifying a much more complex clinical picture. Figure 1 illustrates this issue: patients with different symptoms and thus (potentially very) different scores on individual HDRS and YMRS items are “binned together” in the same category, leading to a loss of actionable clinical information. Predicting all items in these scales can instead align with everyday psychiatric practice where the specialist, when recommending a given intervention, considers the specific features of a patient, including their symptom patterns, beyond a reductionist disease label17,18. Figure 1 illustrates a case in point where knowledge of the full symptoms profile might enable to tailor treatment: on the face of it, patient (a) and (b) (top row) share the same diagnosis, i.e. MDE in the context of MDD; however, considering their specific symptom profile patient (a) might benefit from a molecule with stronger anxiolytic properties whereas patient (b) might require a compound with hypnotic properties. Furthermore, an item-wise analysis can lead to the identification of drugs symptom specificity in clinical trials19,20.
Table 1 summarizes previous works in personal sensing for MDs and showing that all previous tasks collapsed the complexity of MDs to a single number. Côté-Allard et al.21 explored a binary classification task, that is distinguishing subjects with BD on a ME from different subjects with BD recruited outside of a disease episode, when stable. The study experimented with different subsets of pre-designed features from wristband data and proposed a pipeline leveraging features extracted from both short and long segments taken within 20-hour sequences. Pedrelli et al.22, expanding on Ghandeharioun et al.23, used pre-designed features from a wristband and a smartphone to infer HDRS residualized total score (that is total score at time \(t\) minus baseline total score) with traditional ML models. Tawaza et al.24 employed gradient boosting with pre-designed features from wristband data and pursued case-control detection in MDD and, secondarily, HDRS total score prediction. Similarly, Jacobson et al.25 predicted case-control status in MDD from actigraphy features with gradient boosting. Nguyen et al.26 used a sample including patients with either schizophrenia (SCZ) or MDD wearing an actigraphic device and explored case-control detection where SCZ and MDD were either considered jointly (binary classification) or as separate classes (multi-class classification). Of notice, this was the first work to apply artificial neural networks (ANNs) directly on minimally processed data, showing that they outperformed traditional machine learning models. Lastly, the multi-centre study of Lee et al.27 investigated mood episodes prediction with a random forest and pre-designed features from wearable and smartphone data. Further to proposing a new task, our work stands out for a sample size larger than all previous works by over 2 dozen patients, with the exception of a multi-centre study by Lee et al.27, where, however, clinical evaluation was carried out retrospectively, thereby inflating chances of recall bias28 and missing out on the real-time clinical characterization of the acute phase. Indeed, collecting data from patients on an acute episode, using specialist assessments and research-grade wearables, is a challenging and expensive enterprise.
Relatively to previous endeavors, the contribution of this work is two-fold: (1) Taking one step beyond the prediction of a single label, which misses on actionable clinical information, we propose a new task in the context of MDs monitoring with physiological data from wearables: inferring all items in HDRS (17 items) and YMRS (11 items), as scored by a clinician, which enables a fine-grained appreciation of patients’ psychopathology therefore creating opportunity for bespoke treatment (Fig. 1). (2) We investigate some of the methodological challenges associated with the task at hand and explore possible ML solutions. c1: inferring multiple (28) target variables, i.e. multi-task learning (MTL, see Section 3.4.1). c2: modeling ordinal data, such are HDRS and YMRS items (see Section 3.4.1). c3: learning subject invariant representations, since, especially with noisy data and a sample size in the order of dozens, models tend to focus on subject specific features to master the task at hand leading to poor generalization29 (see Section 3.4.2). c4: learning from imbalanced data, as patients on an acute episode usually receive intensive treatment and acute states therefore tend to be relatively short periods in the overall disease course30,31 thereby tilting scores towards low values.