The PRISMA flowchart in Figure 1 outlines the search and screening process. The systematic search resulted in 18,959 potentially relevant articles. In addition, we found one article through hand searching. After removing duplicates, 11,282 articles were screened by title and abstract, and 61 full-text articles were assessed for eligibility. The main reasons for exclusion of full-text articles were ineligible study population, i.e., the participants were younger than 15 or older than 20 years, or inappropriate setting, i.e., middle school, university, etc. In total, nine articles met the aforementioned inclusion criteria and were included in the qualitative synthesis [36–44]. Agreement among reviewers was moderate after title and abstract screening (k = 0.53), and very good after full-text screening (k = 0.87) .
Quality ratings are shown in Table 2. With six out of nine studies, the global rating of the majority of studies was weak [36–38, 41–43]. Only one study was rated as strong , and two studies were rated as moderate [39, 44]. For the individual EPHPP domains across all studies, blinding was the most weakly rated domain (n = 7) [36–38, 41–44]. However, study design and selection bias had no (n = 0) and very few (n = 2) weak ratings, respectively [36, 37]. Four studies were rated as having strong study design, including randomized controlled trial  or cluster randomized controlled trial study designs [39, 40, 42]. The other five studies were rated as moderate with respect to study design strength, with quasi-experimental designs (two groups pre and post [36, 37, 43] or one group pre and post [38, 41]). The other domains differed more in their ratings. While confounders were rated as either strong or weak, data collection method, as well as withdrawals and dropouts, varied similarly in their ratings between weak, moderate, and strong.
Table 3 provides an overview of study characteristics in detail. Seven of the nine studies were conducted in Europe: four in Germany and one each in Belgium, Finland, and the Netherlands. Two studies were performed at community colleges in Taiwan. Three German studies took place in workplace settings, while the other European studies were conducted at VET schools. Sample sizes ranged from 23 to 231 participants, with a mean age between 15.5 and 19.4 years.
Intervention details are presented in Table 4, with interventions ranging from four weeks to two years in duration. Regarding the addressed behavior, the interventions either focused on PA only [37, 39, 40, 43, 44] or followed a multi-behavioral approach in which, for example, alcohol consumption, life-skills training, and/or nutrition were treated in addition to PA [36, 38, 41, 42]. Three interventions comprised multiple components that either addressed a person’s behavior or additionally adjusted the conditions in the setting [36, 39, 43]. For example, Verloigne et al.  offered various PA measures, while Angerer et al.  and Hankonen et al.  modified the context by providing PA equipment. The other six one-component interventions focused solely on individuals’ behavior, comprising stand-alone information and course offerings that included the provision of information or behavioral training (e.g., information, motivation, and counselling).
Furthermore, the interventions differed in the way they were developed and implemented. Essentially, the interventions could be classified into top-down and bottom-up interventions. Top-down interventions were developed and implemented by experts and followed a theoretical and scientific orientation in terms of their goals and content [36–38, 40, 42, 44]. By contrast, the bottom-up interventions followed a participatory approach, ranging from the target group’s involvement in designing teaching units , through a stepwise intervention development involving different stakeholders , to the entire intervention development and implementation using a co-creation approach .
Further special characteristics of individual studies included, for example, an online-based intervention in the form of a multimedia game  or an additional intervention for teachers to reduce their students’ sedentary behavior in class .
The studies’ outcomes are grouped into four major categories: PA, physical fitness, physiological parameters, and psychological factors. Most studies measured more than one of these outcome categories.
Seven studies measured PA either subjectively using standardized questionnaires or objectively using accelerometers. Four of the seven studies [38, 40, 43, 44] found significant baseline to post-intervention improvements in PA. Among these, two studies subjectively measured PA and identified a significant intervention effect on activity level  and extracurricular sports participation , while two studies objectively measured PA and found significant effects. Thus, Lee et al.  revealed a significant increase in the number of aerobic steps, and Walter et al.  indicated a significant increase in mean activity intensity. Three studies did not find significant changes in PA level [39, 41, 42].
Physical fitness components were tested by motor performance tests or body analyses in six studies. Two of these studies identified a significant intervention effect on endurance [37, 44]. In another study, a significant decrease in body weight and weight-for-length index was found following the intervention . The remaining three studies found no significant changes in body mass index, body composition, or cardiopulmonary endurance [36, 39, 40].
Physiological parameters measured through blood pressure or blood tests were examined in three studies. Only Chen et al.  reported significant improvements from baseline to post-intervention on physiological parameters, in this case systolic blood pressure, high-density lipoprotein, and total serum cholesterol. In two other studies, no significant effects on blood pressure, heart rate, sugar metabolism, or fat metabolism were found [36, 37].
Eight studies assessed psychological factors using standardized questionnaires. Of these, three identified a significant change in psychological factors. Hankonen et al.  reported a significant improvement in the use of behavior change techniques from baseline to post-intervention in the intervention group. Furthermore, Sickinger et al.  found significant improvements in general self-effectiveness expectations, and Verloigne et al.  reported a significant intervention effect on self-efficacy. Five studies did not find significant changes in psychological factors, including determinants of PA, mood state, psychological aspects related to mental health, self-efficacy, or self-rating of physical and mental health characteristics [36, 37, 40, 42, 44].
Overall, two studies indicated significant effects in all measured outcome variables [38, 43], whereas two other studies did not find significant effects in any measured outcome variables [36, 42].