Crowdsourcing Without Data Bias: Building a Quality Assurance System for Air Pollution Symptom Mapping Toward an SDG Indicator

The United Nations (UN) sustainable development goals (SDGs), a strategy to guide the world’s social and economic transformation, highlight the issue of urban air pollution in SDG 11. Open data, as an output of citizen science (CS), are needed to supply and improve the SDG indicator system. Therefore, we propose a CS framework to extend the paradigm of urban air pollution monitoring from particulate matter concentration levels to air quality-related health symptom load, and foster the development of a tier-3 SDG indicator (which we call indicator 11.6.3). Building this new perspective for CS contributions to the achievement of SDGs, we address the problem of crowdsourced data bias as a prerequisite for better quality open data output. The aim of this study is to propose an air pollution symptom mapping framework for citizen-driven research and to �nd the most robust data quality assurance system (QAs) in this �eld. The method includes a GeoWeb application as well as data quality assurance mechanisms based on conditional statements, in order to reduce crowdsourced data bias. A four-month crowdsourcing campaign, released in Lubelskie voivodship (Poland), resulted in 1823 outdoor reports with a rejection rate of up to 28%, depending on the applied QA system (QAs). Testing the QAs variants, we �nd the most robust data bias solving method in survey-based symptom mapping. The framework output is shared via GeoWeb dashboards, including the 11.6.3 indicator evaluation. By familiarizing the public with citizen science, a city can track the progress of its SDG achievements and increase the transparency of the process through the use of GeoWeb.


Introduction
Urban air pollution is well-known to cause negative health impacts. Therefore, the monitoring of pollutant concentration levels plays a key role in understanding air quality and its effects on the subjective well-being (SWB) of citizens (Laffan 2018). SWB re ects the philosophical notion of a good life, as a proxy for assessing life satisfaction, momentary experiences, and stress. Kim-Prieto et al. (2005) also took into account contemporary health hazards, among which air pollution is a key factor (Ferreira et al., 2013, Signoretta et al., 2019. To promote public well-being while protecting the environment, the UN has targeted 17 Sustainable Development Goals (SDGs) and their indicators (Koch and Krellenberg 2018) to track the overall progress towards 2030. SDGs have become fundamental strategies to guide the world's social and economic transformation (Shi et al., 2019), putting emphasis on respecting natural resources and the needs of future generations. Of the SDGs, the 11 th SDG is targeted at reducing the adverse per capita environmental impact of cities, including paying special attention to air quality; additionally, SDG target 3.9 aims to reduce the number of illnesses, among others, from air pollution (UN, 2015). The development of smart and sustainable cities can only be accomplished through inclusive growth, using smart people, technologies, and policies . From the perspective of Smart People (Gi nger et al., 2007)-those who use smart devices to make their everyday living easier and health safer-we found it necessary to develop GeoWeb solutions to measure the adverse impact of cities on their inhabitants. Ensuring measurement credibility becomes a key scienti c challenge in this context. To this end, we carried out research on the example of air pollution health symptoms, an emerging trend particularly related to odour (Arias et al., 2018) and green pollutant (Bastl et al., 2015) crowdsensing (Dutta et al., 2017, Feng et al., 2018. As crowdsensing (or, more generally, crowdsourcing) methods for health symptom mapping are subject to data bias (Zupančič and Žalik 2019), we developed and tested the quality assurance mechanism (QAm) framework (section 2.2), which can be transferred to similar health symptom-based studies. From a practical point of view, we use a case study crowdsourcing data set to track the progress on SDG 11 with the use of tier 3 SDG 11.6 indicators (Koch and Krellenberg 2018). The tier structure of the SDG indicator system de nes tier 3 as a group of indicator candidates for which no agreed measurement methodology is available; we use this tier to propose a measure of citizen SWB with respect to adverse per capita air pollution impact.
Sparse or irregular monitoring station networks as well as limited access to the reference air pollution data underlies the need for CS activities in the eld of air pollution monitoring. Personalized information on exposure to air pollutants, monitoring during acute events or at speci c locations, partnerships with local governments, and educational and community-driven purposes are the key bene ts of bottom-up environmental monitoring. CS enables the collection of data on much larger spatial and temporal scales and at much ner resolution than would otherwise be possible. The issue of urban air pollution crowdsourcing motivated the implementation of several citizen science (CS) programs, such as those led by Mapping for (APSs), we indicate new possibilities for citizen-driven research and social inclusion in environmental and health-related issues, as experienced by sustainable cities. We also contribute to the development of an sCS data quality methodology.
Furthermore, the quality of the spatial data determines its usability in the eld of SDG indicators (OWA 2015, Hecker et al., 2018).

Sustainable Development Goal Partnership on Urban Air Pollution
Sustainability is an interactive process, maintaining a dynamic balance among six dimensions-land, natural environment, institutions, technology, economics, and humans-where change within one dimension has an impact on the others (Dockry et al., 2016). These changes and impacts may occur globally, which is why GIScience (Goodchild, 2009) plays a leading role in achieving the global SDGs.
The general idea for the SDGs has been expressed, by Brundtland et al. (1987), as the need to "meet the necessities of the present generation without harming the future generation's capacity". We consider health symptoms caused by air pollution as one of the indicators of the current ecological footprint of humanity on the environment. In 2000, Brundtland's idea was formulated as the Millennium Development Goals by the UN (Sachs, 2012). Over the next 15 years, the idea was emphasized as the interconnected environmental, health, social, and economic aspects of development (Schleicher et al., 2018) in SDG 2030. Out of the 17 SDGs, the 11 th SDG refers to sustainable cities and communities and the third SDG refers to public health.
In both cases, poor air quality caused by ambient air pollutants (in particular, referring to SDG target 11.6) is a key issue, which provided the motivation for this research. The Organization for Economic Cooperation and Development (OECD) predicts that, by 2050, air pollution will be the main reason for human mortality (Marchal et al., 2012). Therefore, we focused on urban air pollution as a case study with the starting point of health symptoms caused by human exposure to air pollutants. This concept highlights the relationship between urban habitats and SWB of citizens, as well as the interdependence of the particular SDGs. The issue of air pollution requires spatial information provided thoroughly by a modern spatially variable society (Enemark andRajabifard, 2011, Ionita et al., 2015). This underlines the need to implement a local partnership between air pollution monitoring agencies, researchers, and the local community, who all breathe the same air. Therefore, high quality geospatial data, in terms of air pollution, is expected to be used for monitoring global progress towards achieving the SDGs.
The need for open geospatial technologies for measuring SDG 11.6 has been discussed by Choi et al. (2016), where special attention was paid to sCS as a public-academic partnership, where citizens collect data, which is then used by research institutions and themselves. By engaging in APSM, the project members, as citizen scientists (Bonney et al., 2016), facilitate the implementation of the SDGs to become an integral part of social innovation.

Extending the Paradigm of Urban Air Pollution
In the eld of environmental research air quality, information on the quality (i.e., clean or polluted) of air is reported as an air quality index (AQI; Liu et al., 2019). AQI tracks six major air pollutants, inhalable particles (PM 10 ), ne particulate matter (PM 2.5 ), ozone (O 3 ), sulfur dioxides (SO2), nitrogen dioxides (NO 2 ), and carbon monoxide (CO; Sheng and Tang, 2016). The spectrum of pollutant sources includes those related to the development of human civilization (anthropogenic pollutants; Grewling et al., 2019), as well those from natural sources, which questions the belief that everything that is natural is healthy (Liang, 2013). Ambient air pollution concentrations above the approved limits Fussell, 2015, Zwozdziak et al., 2016) can cause certain health symptoms. Conversely, health symptoms can re ect air pollution. However, health symptoms resulting from inhalation of polluted air are also stimulated by natural-sourced biophysical PM such as pollen, mold spores  Grewling et al., 2019). In terms of air quality, aerobiologists focus on plant species whose pollen is most harmful for pollen allergy sufferers (e.g., birch, alder, mugwort, grass, and so on) and emphasize that their co-occurrence with PMs is affected by other factors, such as increasing urban air temperature (Grewling et al., 2019). Bastl et al. (2015Bastl et al. ( , 2017 described pollen as one of the "green pollutants", which are signi cant components of the atmosphere and are relevant to air quality information for pollen allergy sufferers. This distinction is important for the comprehensive understanding of APS. Air pollution is speci ed as the concentration of pollutants measured in physical values (e.g., micrograms per cubic meter), whereas air quality refers to AQI, as well as to classi cations, opinions, and feelings, including the experiences of citizens in terms of air -and air quality-related SWB (Laffan, 2018, Signoretta et al., 2019. This broad understanding of air quality is accepted in ecosystem services science, where poor air quality is referred to as an ecosystem disservice (Escobedo et al., 2011, Sacchi et al., 2017. This concept extends our understanding of air pollution from pollutant concentration levels to personal health symptoms caused by pollutant inhalation. The quantity and severity of symptoms can explain the air quality; however, consensus about the terminology involving urban air quality has not yet been reached and researchers typically distinguish air pollution through pollen exposure (McInnes et al., 2017). There is no symptom classi cation for air quality yet. Regardless, both factors shape air quality. Future research is required to understand and quantify the interaction of co-exposure to both types of air pollutants and its impact on the severity of human health symptoms (Robichaud and Comtois, 2019).
Symptom mapping is a prerequisite for the spatial explanation of both dependencies. First attempts of citizen symptom mapping related to green pollutants have been undertaken by Bastl et al. (2017) and Werchan et al. (2017). Their research proved that citizen symptom load can be mapped e ciently using crowdsourced data; however, the sources of the symptoms cannot be clearly determined. The symptom load index is not directly correlated with annual pollen loads and has a strong correlation to allergen content (Bastl et al., 2017), with an (often daily) linear correlation (Bastl et al., 2017, Bédard et al., 2020. Finding that relationship is beyond the scope of this paper; however, crowdsourced symptom data have shown potential as an indicator of the effects of urban air pollution on citizen well-being. This raises the possibility for new tier 3 SDG indicator, as monitored following a standardized CS method. The unstructured nature of crowdsourced data requires rigorous QA mechanisms. In this study, our aim is to identify QA system for APSM and provide a GeoWeb framework to stream high-quality data in order to facilitate a tier 3 SDG indicator system. So far, this data stream does not exist. By sharing trusted and open data on air pollution symptoms, our ndings can be used for aerobiological and health risk forecasting research.

Contribution of Citizen Science to Improvements in Air Pollution Mapping
According to Haklay (2013), geographical citizen science overlaps VGI, especially in the geographical context of citizen-driven research. GeoWeb plays an essential role in this eld. However, it is crucial that CS and VGI should not be seen as equal, as the main purpose of VGI is to produce geographical information, whilst citizen science aims to produce new scienti c knowledge (Connors et al., 2012, Eitzel et al., 2017. Citizens engaged in scienti c research projects become citizen scientists (Silvertown, 2009) who, depending upon their personal interests, motivation, education level, and experience in previous projects, engage with different levels of participation and expect to see the results of their research contribution. They contribute in the project by collecting and analysing data, but may also be involved in de ning research questions or even interpreting results (Dickinson et al., 2010, Haklay, 2013, Kar et al., 2016. Considering the scope of citizen participation, Haklay (2013) has de ned four levels of CS: crowdsourcing ( rst level), distributed intelligence (second level), participatory science (third level), and extreme citizen science (fourth level). Citizen involvement in environmental projects on air pollution is usually based on collecting and analysing sensor data in the form of online maps. In this way, knowledge is produced. The fundamental questions about the harmful health effects of air pollutant have been asked, so these activities are typi ed as CS level 1 and CS level 2. Of course, higher levels (depending on the engagement of members) are not excluded. In the case of odour crowdsourcing, which requires training as well as expecting measurement insights back from members, a collection method can be devised (i.e., level 3). Reviewing the most relevant air pollution citizen science activities (Table 1), the typology of participation engagement can be assigned to be basic on the project description; however, this does not limit the engaged members to achieve the next levels through the re-use of data, scienti c collaboration, and report publishing. Our study was based on the rst level of CS, where citizens are engaged in the process of crowdsourcing APS data to monitor progress toward the achievement of SDGs 11.6.3, producing a new scienti c knowledge of APSM together with researchers. CS provides a solution to research problems while also educating citizens ). Before starting to collect data in this study, citizens were educated about the research problem and project aims and were trained how to use the associated tools properly. By attending workshops, the citizens gained knowledge and new skills, and followed the progress of the project in real-time. By sharing their conclusions and opinions during the social campaign, they had a direct impact on the optimization of methods used.
So far, smartphones have not been considered appropriate equipment for measuring urban air pollution. This is due to the fact that the built-in sensors of smartphones, by default, do not allow users to measure air pollutant concentrations. Therefore, bottom-up activities considering air pollution have usually relied on external, low-cost sensors (initially only capable of PM measurement, these sensors can now also sense all major pollutants, including volatile organic compounds). In an attempt to involve smartphones users into air pollution monitoring, efforts have been made to determine the PM concentration with the use of a mobile app which takes images of clear blue skies (AirTick project), with an average of day time PM1 concentration level up to 87% (Zhu et al., 2018). Other approaches have used spectropolarimeters as add-ons, such as within the iSpex project (Snik et al., 2014), to measure PM concentration level. The idea of using a smartphone camera to measure air pollution has been adopted by the HackAir project (Kosmidis et al., 2018). Furthermore, the most recent smartphone cameras and ash function-based development of a ne dust measurement system called FeinPhone (Budde et al., 2019) suggests that low-cost PM sensors may become default equipment in next-generation smartphones. Low-cost and relatively good result correlations with reference air pollution stations (Karagulian et al., 2019) allows users to set up citizen science initiatives and involve local communities into global problem solving. The most relevant of these projects are listed in Table 1, which is an extension of the review carried out by Moumtzidou et al. (2016). The relatively simple design of citizen science sensors makes them suitable for do-it-yourself (DiY) workshops. Creating local workshop groups, usually co-ordinated by a local Media Lab, allows the establishment of communities which are emotionally involved in self-created monitoring networks, which becomes the basic mechanism motivating the continuation of the local monitoring project. Furthermore, the growing awareness of air pollution hazards has led to the development of personal sampler devices (e.g., PlumeLab) designed to be mobile and facilitating realtime monitoring of exposure to air pollution; such new smart devices could be used effectively in citizen science activities.
Coupled with the application to health symptom recording, they could progress our understanding of air pollutants, their coexistence, and their relationships with human health (Bédard et al., 2020). Citizen measurements were formerly conducted in a stationary manner through the use of passive diffusion tubes (Palmes et al., 1976) or wipes for pollution measurement; at present, such measurements can successfully be carried out in a mobile way through the use of smart sensors. Loreto et al. (2017) emphasized that modern participatory sensing, which is one of three sub-categories of citizen cyberscience (Grey, 2009), has witnessed signi cant progress related to the fast development and social networking tools of ICT (Information and Communication Technologies), which "allow effective data and opinion collection and real-time information sharing processes". In that context, Guo et al. (2015) and Capponi et al. (2019) introduced mobile crowdsensing (MCS), which focuses on sensing and collecting data with mobile devices and aggregating data in the cloud. However, there are pollutants which are still exclusive for IoT 'sensor dust'. A great challenge of contemporary CS measurement is odour sensing, which affects both indoor as well as outdoor air quality. Human-sensed air pollution monitoring seems to be an emerging trend. In this research, we specify the "citizens as sensors" and participatory sensing concepts, where the senses, subjective impressions, and perception of humans are the only sensors used in the project; therefore, we propose this as human-sensed questions. The survey design allowed us to select and reject attribute table contradictions, in order to reduce data bias, such as user response inconsistency, location inaccuracy, and duplicate time-space-related reports.
By combining several logic-based data quality assurance mechanisms (QAms), we tested the robustness of the QAms to nd the strongest QAm set and build a ranked data quality assurance system (QAs).
To achieve the SDG 11.6 target, reliable sources of spatial data are needed. We did not solve the problem caused by the nonair pollution-related factors which affect human symptom severity, which act synergistically with air pollution to contribute to spatial database robustness on health-related symptoms ( The goal of this study was to answer the question of quality assurance mechanism implementation in the GeoWeb-based APSM. For this purpose, we propose a dedicated air pollution symptom mapping (APSM) framework for the following QA mechanisms: start-check, sequence, cross-validation, repeating, and time-loop check (see Section 2). Our research question was: Which QAs best reduces data bias in APSM? The sources of data bias include contradictory entries in the geodatabase attribute table recorded as answers supplied to the specially created APSM survey. By answering the research question, we aim to underline the importance of CS for the achievement of the SDG 11 and 11.6 targets.

Materials And Methods
For our participatory APSM project, we followed the CS development framework of , starting from the research question and project team formulation through to CS action execution and the dissemination of project ndings (Sect. 2.1. and 2.3.). However, we focused on addressing data bias (Sect. 2.2.) to improve the symptom-based air pollution mapping data quality, as a contribution to the achievement of SDGs through the provisioning of spatial information, as well as social inclusion in sustainable development.

Building the Project Team and Field Data Collection Strategy
Having de ned the scienti c question, we formed the project team, which was based on the following roles recommended by Scientist: responsible for formulating the research question, crowdsourced data protocol design, co-operation with citizens, and answering the research question.
Educator: responsible for training the participants.
Technologist: provides GeoWeb tools to the project members. Technologically, the project is based on the cloud and con gurable applications are built using the "puzzle" idea in ArcGIS Online (AGOL; Esri Inc., Redlands, CA, USA; Fargher, 2018).
Evaluators: researchers and medical doctors who work in the eld of air quality (including pollen allergy) related to daily symptoms. They are engaged in the app testing process.
Citizens: collect APSM data and follow the results through web map apps.
To turn students into citizen scientists (Harlin et al., 2018), we engaged teachers and students. The crowdsourcing campaign was planned for one academic semester starting in February and nishing at the end of May. Starting the campaign in the rst quarter of the year is crucial, as pollutants and pollen occur simultaneously at the beginning of the year; especially gaseous pollutants which can act as "adjuvants", exacerbating pollen allergenic potency and immunoreactivity (Ring et al., 2001). At the beginning of the campaign, the group of citizens involved in the project was formed, which included students and nonacademic participants. The students of different faculties of the University of Life Sciences in Lublin (Poland) were invited as volunteers. The core of the group consisted of students co-operating within their scienti c student organization. The project was continually open to everybody. The researchers and Ph.D. students of the University of Warsaw (Poland) were responsible for the technological part of the project. Together, they formed the community channel for data and apps sharing, which was implemented in GeoWeb.
Before the eld data collection campaign, the scienti c student organization of the Spatial Management Faculty of the University of Life Sciences in Lublin organized workshops for the citizen scientists, who learned about the research project assumptions and were trained on handling the mobile and web apps (details about apps provided in Sect. 2.3). The workshops, trainings, informing, and research project promotion among citizens lasted for the rst month. They learned that the mobile app requires initiation just after being turned on, in order to x the GNSS (Global Navigation Satellite System) positioning accuracy. Other educational materials were made available in the narrative web apps. They were asked to collect data during their daily outdoor activities, preferably once per day. If they observed air pollution-related symptoms, they were asked to report them as soon as possible. If they caught a cold or were sick, they were expected to stop collecting data until they recovered. The project assumptions and rules for collecting data were included in the mobile app, as an introduction to the study. The wider user guide version was available, at any time, to the citizens in the web mapping application.
After completing the survey in the mobile app, the user was geolocated such that the observed individual symptom severity was presented as a point on the map immediately after it was sent to the cloud.
Our study referred to the patterns of citizen activity characteristic for CS speci cs (Seymour and Haklay 2017). As such, we implemented a dedicated module for citizen motivation improvement, which was based on the monthly activity ranking of users, presenting a number of submitted APS reports, which were available to citizens in real-time. Users were assigned award titles, according to the following number of reports sent per month: 1-5, Beginner; 6-10, Pretty Involved; 11-20, Super Engaged; and > 20, Excellent Citizen Scientist. A user activity tracking module was included in the operations dashboard app (details in Sect. 2.3), which listed the 10 most active citizens, presented as their nicknames (checked to be consistent using the PIN provided by the citizen in the rst survey) together with their award titles, as well as their number of reports in the last month. This mechanism helped to increase user engagement, as the ranking list was public and allowed for competition between the citizens. To alleviate the problem of data bias in the citizen-driven mapping, we used a method based on speci c conditional statements implemented in the survey questions. The proposed method includes data forms, which are the basis of the developed conditional statements. The data stored in the database were displayed as a text data type in the mobile app, which is simple and intuitive for the user. The data were also coded in the database in the short integer data type (except for question 12 (Q12), which was coded in text data type), and were used for data analysis and statistics (Table 2). We initially adopted three QA methods in the mobile survey app, in order to improve data quality during the data collection process. These methods were based on the quality measures of ISO 19157 (2013): positional and temporal accuracy, data completeness, and consistency. The rst QA method eliminates identical reports sent from the same location within a certain time interval (5 minutes) from the database, in case the same report was duplicated. Reports lacking geolocation were excluded from the database by the second QA method. The third method controlled the GNSS positioning accuracy of the reported APS observations, under the assumption that reports with horizontal accuracy error greater than 100 meters were outliers, which were eliminated in the data collection stage. If the surveys were accepted under the three QA methods described above, they were nally checked with the completeness quality measure. Surveys which were not completed, in terms of the obligatory questions, were automatically blocked against submission through the mechanism con gured in the app.

Logic-Based Data Quality Assurance Mechanisms Implemented After Data Collection Process
The proposed QA framework for APSM includes ve QA mechanisms, which work as combinations of speci c conditional statements. The logic formula for each conditional statement was built to lter and eliminate data bias-identi ed as false data-such that false results were returned ( Table 3). The QA mechanisms of start-check, sequence, cross-validation, repeating, and time-loop check were proposed, with one or two levels of robustness (Table 4), and were nally combined into a QA system (Table 5). These QA mechanisms were implemented in the database after the data collection process was nished. The conditional statement algorithms were combined into QA mechanisms, some of which were proposed and studied in two robustness variants, and implemented in the APSM. To date, such a method has not been implemented in an sCS air pollutionrelated symptom mapping project. As a data quality assurance framework for APSM, we propose a data quality assurance system (QAs) which is the combination of each QA mechanism, depending on the QA mechanism robustness variant. The choice of the QA system variant depends on the project character: Each QA mechanism works independently and, so, can be implemented to the APSM project separately or in any combination, if needed.   Table 4 Quality assurance mechanisms implemented in the citizen air pollutionrelated symptoms questionnaire, with less ("1") and more ("2") robust variants. QA mechanism QA mechanism codeQA mechanism components: conditional statement combination Start-Check SC1 Con.1 or Con.2 SC2 Con.3 or Con.4 Sequence Sq Con.5 or Con.6 or Con.7 Cross-validation CV1 Con.8 CV2 Con.9 Repeating Rp1 Con.10 or Con.11 Rp2 Con.12 or Con.13 Time-loop CheckTC Con.14 or Con.15 or Con.16 or Con.17 The start-check mechanism was used to verify the report consistency at the beginning of the survey, excluding reports whose symptom severity answers are not consistent with the general well-being question. The quality assurance method of applying a general question about the issue preceding the detailed questions was used by Bastl et al. (2015) and Bousquet et al. (2017); however, these studies did not report success in using this conditional statement. We examined this in two variants: Variant 1 is less robust and assumes that the report is consistent when the citizens assess their current comfort as good (1), then the answers to Q2-Q7 should be between no symptoms (0) and moderate symptom severity (2). If the answer for Q1 is poor selfcomfort (3), then at least one question between Q2-Q7 must be answered as strong symptom severity (3). Variant 2 is stricter and assumes that the possible answers for Q2-Q7 can only be no symptoms (0) or mild symptom severity (1) when the current comfort is rated as good (1). If the answer for Q1 was poor comfort (3), the same conditional statement was used as in variant 1. For both variants, the reports with the answer I have no opinion (0) for Q1 were excluded. The sequence mechanism was applied to exclude user "automatism" in providing answers, which is often caused by a citizen giving rash answers or not reading the questions. Therefore, each report with all questions answered by responses with the same place in the sequence (e.g., every rst answer for each question) were eliminated from the database. The QA mechanism is based on the method of rearranging the order of possible answers (Albuam andOppenheim, 1993, Garbarski et al., 2015) for one question in the sequence of similarly asked questions. The standard order of the answers for questions Q2-Q7 was: 1, 2, 3, 0. The Q5 question was an exception, with an answer order of: 3, 0, 1, 2. If the citizen repetitively chose the rst answer for questions Q2-Q7, then the QA mechanism excluded the report. The same rule was applied to reports with the third answer for questions Q2-Q7 and for the last answer in questions Q2-Q7. The cross-validation mechanism was used to reject responses by using three essentially related questions. If the answer to the additional question was not consistent to the one of the two previously answered related questions, the report was excluded.
The APS mentioned in Q8 should be related to runny nose (Q4) or watering eyes (Q5) symptoms. The mechanism was tested in two variants: Variant 1 assumes that no runny nose (Q4 = 0) and watering eyes (Q5 = 0) symptoms eliminate reports with very often rubbing eyes (Q8 = 3) observations. Variant 2 is much stricter and additionally excludes reports with often (Q8 = 2) but also seldom rubbing eyes symptom (Q8 = 1). The repeating mechanism determines the consistency of the report, according to the other previously answered questions, by asking for the same question but in a different way. If the repeated answer is not consistent with the former one (Albuam and Oppenheim, 1993, Wiggins et al., 2011), the report was excluded from the database. This was used with Q9, which repeated the content about the citizen's self-comfort in Q1. The mechanism was examined in two variants: According to variant 1, the report was eliminated if the minimum or maximum answer codes of Q1 and Q9 were the opposite (e.g., the answer code 2 for Q1 was consistent for Q9 answer codes 1, 2, or 3). Variant 2 was more robust, assuming that the answer codes from Q9 and Q1 have to be the same, with each exception from this rule excluded from the database. The mechanism was used to compare the answer to Q9 with the answers of the severity symptom questions (Q2-Q7). When the answer for Q9 was poor selfcomfort (3) and none of questions Q2 to Q7 were answered as strong symptom severity (3), the report was excluded. The conditional statement for Q9 answered as high level of comfort (1) was examined in two variants: In the rst variant, a report with at least one answer in Q2-Q7 representing strong symptom severity (3) was eliminated when the answer for Q9 was coded 1. According to the second variant, if the reported symptom severity in any of Q2-Q7 was assessed as moderate (2)  The tools for the APSM project are based on GeoWeb. We used ArcGIS platform components, which were available to the technologist as a puzzle structure, which allowed for direct customization of the applications to implement the APSM assumptions and requirements. For the project, we con gured the mobile app and a set of web apps was publicly shared for citizens.

Mobile App for Crowdsourcing
The mobile app was based on the Survey123 for ArcGIS components and is available at the public link: https://arcg.is/0HWXrO. The survey consists of six information pages to facilitate its use and clear navigation (Fig. 2a). It is available in two languages: Polish and English (Fig. 2b). The app starts with an introduction with a short user guide (Fig. 2c), in order to explain the research rules and how to use the survey app (page 1), followed by user basic info (nickname, four-digit PIN, and student/non-academic status; Fig. 2d), helping the users in the citizen group to control the data collection process (page 2). The next pages (3-5) include 12 APSM questions which are completed with the user geolocation and the date of the report (page 6). All obligatory questions are marked with a red star (Fig. 2e). The third page focuses only on the general wellbeing level of the citizen (Fig. 2f), which is the basis for the start-check mechanism. Then, the citizens answer questions about their individual symptoms using drop-down lists of answers (Fig. 2g). In the summary (page 5), the citizens specify their level of well-being, choosing from a star rating scale; where one star means the lowest and three stars indicate the highest level of well-being (Fig. 2h). Then, using the calculator appearance widget, the users report the length they have stayed in their location and the symptoms have been observed for. These values are expressed in minutes, provided for question 11 (Fig. 2i). Question 11, regarding the length of the observation of symptoms, is xed in the app as relevant only when any APS are observed. If Q2-Q7 are answered as "no symptoms", then Q11 does not appear in the survey. On the last page of the app, a map widget is presented to mark the current location and date (Fig. 2j). Here, app users are told that all reports with horizontal positioning accuracy error greater than 100 m are automatically eliminated, as these values are considered as GNSS positioning accuracy outliers. When completing the survey, the user can check the current location status at any time (i.e., latitude, longitude, and horizontal accuracy; Fig. 2k). The default date is set to the current date. The geolocation defaults to the current GPS location of the user, as well. When the survey is completed, a bottom-right submit tick is made active and the report is ready to send to the cloud geodatabase. 2.3.2. Field Data Collection Module: Mobile App as between 1 minute and 2 hours. This mechanism assumes that all reports with a time loop value higher than the duration the user remained in the place were eliminated. We excluded all reports with symptom observation lengths equal to one minute. Reports with symptoms observed for more than two hours were excluded as well, as such information indicates a latephase reaction, which is not connected with the current geolocation of the citizen. When the database was completed, the technologist connected the database as an AGOL service (REST) to the desktop solution and prepared the data set through the implementation of the logic-based QA mechanisms. To analyse the robustness of the QA mechanisms, as well as the survey result statistics, we used the R statistical software (R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, www.R-project.org). Finally, the analysis results were published back to the cloud and presented in the public web mapping application. The results were reported as a percentage of the observations eliminated by the most robust QA mechanism combination, presenting the total robustness of implemented QA system.
The questions proposed for the survey in Sect. 2.2 were included in the con gurable Survey123 application. The mobile survey was divided into individual pages, in order to help the user to quickly navigate between questions without scrolling down the whole form. An o ine mode was provided so users could choose whether to use an internet connection when collecting data. All survey elds were obligatory. We used a user authorization option with an alphanumeric nickname and 4-digit PIN at the beginning of the survey, in order to help follow the reports of each citizen while also providing them anonymity. Q1 to Q9, concerning well-being and symptom severity aspects, were implemented as single-choice questions. Q10 and Q11 were displayed using a calculator functionality, facilitating the entry of time data expressed in minutes. Q12 was expressed as a multiple-choice collection. The last section of the mobile survey contained a map, where the mobile app user could set their geolocation. The last eld in the app was used to determine the current date of the symptoms observed.
The eld data collection module was available for free, with the option of creating the survey app icon on the smartphone screen. It was available in two languages-Polish and English-and users could choose their desired language when lling in their responses. To start using the survey, we advised users to download the Survey123 for ArcGIS app from Google Play (Google LLC, Mountain View, CA, USA) or the App Store (Apple Inc., Cupertino, CA, USA) to their mobile device. Then, using the public survey link, the form was downloaded to the app. The link for the survey could also be entered into a web browser.

Air Pollution Symptom Mapping Module: WebGIS App
The APSM web interface included two dashboard apps: one for real-time raw data presentation and another for QA-checked results presentation, which were published after the eld data collection campaign. The real-time data on APSM were presented with a point symbolized map accompanied by an indicator summarizing the current number of reports and activity curves of the users. The dashboard displayed the user activity ranking widget, which was a list of user nicknames ordered in a bar chart ranking their monthly activity (Sect. 2.1). The second dashboard included a point symbolized map and statistics of the QA-checked symptom severity data. We assumed that the dashboard presented the results of data checked with the most robust QA system. The application allowed citizens to choose layers for each symptom. The dashboard presented the proposed tier 3 SDG indicator for the SDG 11.6 target. The statistical data corresponded to the current map range. The dashboard also contained information about the percentage of data reports rejected by the most robust QA system. The dashboard presented the proposed tier 3 SDG indicator for the SDG 11.6 target. We propose 11.6.3 as the next SDG (tier 3) indicator, which measures subjective well-being as identi ed by citizens, to be no health symptoms are caused by air pollutants. This indicator is weighted per 100,000 population. Contrary to air pollution-related hazard medical research (e.g., 11.6.2), SDG1 1.6.3 re ects the well-being of citizens as a result of unpolluted air and no nuisance from breathing city air. SDG 11.6.3 is de ned as the ratio of the percentage of surveys reporting "no APS symptoms" for each APS question (Pn) to the total number of city inhabitants (Nc). The indicator is measured monthly, due to the variation of air pollution level throughout the year.
where P n is the percentage of reports without any symptom observed and N c is the total nu2.3.

Air Pollution Symptom
Mapping: results sharing through Web apps mber of city inhabitants. To de ne the minimum monthly number of reports required for inference, we carried out a signi cance test for a proportion.
Statistical tests were carried out both before and after implementation of the most robust QA system (QAs 8 ), specifying the statistical error for the test of proportion, with the assumption α = 0,05; con dence level 95%; and fraction 0.  (Fig. 3).
The rst app-introduction to the project (Fig. 4)-is based on the Esri Story Map Cascade template, which is used for building narrative web mapping apps by combining images, maps, and multimedia context with narrative text (https://storymapsclassic.arcgis.com/en/app-list/cascade/). The application has an educational function for the citizens involved. It provides educational materials about air pollution and the APSM project idea, as well as extended mobile and web app tutorials and technical knowledge. Button number 2 links to a web version of the Survey123-based application for data collection. Within the eld data collection campaign, the reports were mapped, in real-time, in a point symbolized data layer. Each point represented the location and code attributes of an individual symptom severity observation reported by a user through the mobile app. The mapping module, based on a set of web apps, was responsive and compatible with the Survey123 app. Cooperation between the Survey123 mobile app and the web mapping module was based on the typical WebGIS architecture (Lupa et al., 2017). The application presents the raw data collected before the QA process and contains the operations dashboard-based interface which consists of ve modules: a map with the raw data APSM reports collected during the crowdsourcing campaign (Fig. 5a), a legend (Fig. 5b), an indicator counting the total number of reports (Fig. 5c), a histogram of the citizen activity from the beginning of the crowdsourcing campaign to the current moment (Fig. 5d) which changes dynamically, according to the map, and a citizen activity ranking, divided for each month and cumulatively (Fig. 5e), as described in Sect. 2.1.
The time slider app is based on the Esri Time Aware con gurable template. It includes a map of point-symbolized QA-checked reports accompanied by a time slider tool, which displays the increase of collected data over the entire duration of the crowdsourcing campaign. The time slider can move automatically (with a play button) or can be moved manually to the required date (Fig. 6).

Results
The QA framework is key for useful and effective sCS and, so, the results mostly focus on the implemented QA mechanisms and the robustness analysis of the QA system variants for air pollution-related symptom mapping. On the other hand, the results focus on the achievement of the proposed SDG 11.3.6, based on the APSM framework. We rst focus on the eld data collection campaign, which provided us the input for our analysis. The citizen activity curves indicated the varying and regularly decreasing activity of the citizens. For the QA robustness results, our key focus was the eight QA system variants, the combinations of ve logic-based QA mechanisms, which rejected from 18.3-28.6% of reports with data bias. As a result, we created GeoWeb tools, based on the ArcGIS platform, which were con gured and customized to the study requirements; these were proposed, nally, as an APSM framework.

Data Collection Campaign Outcomes
The method was adopted in the city of Lublin and Lubelskie voivodship, located in Eastern Poland. The data collection campaign lasted for one academic semester, from February to May 2018. At the beginning of February, we started the crowdsourcing campaign, recruited citizens, and informed of and promoted the research project among them. We created a group of citizens involved in the project, comprised of 56 students of different faculties of the University of Life Sciences in Lublin, who were involved in the project as volunteers; of which, 30 Spatial Management students were the core citizen group of the project. A total of 18 non-academic citizens joined the research and collected data together with the student group. They became members of the CS-Community-UW (Citizen Science Community of University of Warsaw) Group (https://arcg.is/WeqfK), which was set up to share all project data, materials, and apps in GeoWeb, in order to provide continuous access to the results. The group was managed by researchers and Ph.D. students of the University of Warsaw and educators, together with the student organization of Spatial Management of the University of Life Science in Lublin. The citizens participated in workshops and trainings until the end of February and received access to educational materials and instructions through the Esri Story Map Cascade app template, which is a web mapping app template for building narrative apps. During the data collection campaign, 1936 APS reports were sent by citizens to the cloud, which were used as the input to the QA mechanism-developed database. When citizens collected data, their activity was controlled and updated every month in a user activity module implemented in the dashboard app (Sect. 3.3.2). Analysis of the activity curve indicated that citizen activity peaked for 11 days at the beginning of the campaign (39-48 reports per day) and then decreased. After this, activity stabilized at between 4 and 29 reports per day, with two peaks of 32 and 34 reports per day (Fig. 7). The students were more active than non-academic citizens. The most active student collected 25 reports in April, whereas the most active nonacademic citizen collected 7 reports in March.

Robustness of data quality assurance mechanisms
During the initial stage of data set ltering, 19 outliers with no geolocation were eliminated. Another 68 APS records were excluded as they were duplicate reports at the same geolocation within a short (5 minute) interval. The horizontal positioning accuracy varied from 0 to 100 m, where 93% of reports had a horizontal accuracy between 0 and 30 m. We ltered 26 outliers which had a horizontal accuracy error exceeding 100 m, which were rejected. As a result, 1823 reports of the 1936 remained and were used as the object of our logic-based QA implementation study. After the implementation of the QA mechanisms in the database, their robustness was analysed.
According to Table 6, the three most robust QA mechanisms for our speci c case study were: repeating in two variants (more robust, repeating2 (Rp2); less robust, repeating1 (Rp1)) and start-check in the more robust variant (SC2). The repeating2 mechanism excluded 23.1% of reports, repeating1 eliminated 10.6% of reports, and start-check2 eliminated 6.9% of reports. Most data bias resulted from inconsistencies in the repeated questions. Three QA mechanisms-start-check, cross-validation, and repeating-which were implemented in two variants (i.e., less robust and more robust) were analysed in terms of the report reduction in each variant. For the start-check mechanism, the two variants reduced 71 of the same reports, while start-check2 eliminated 54 more reports than start-check1. Cross-validation1 and cross-validation2 reduced the same 54 reports, while cross-validation2 additionally eliminated 54 observations. For the repeating mechanism, repeating1 and repeating2 were compatible for 194 reports, while repeating2 was 12.5% more robust than repeating1, excluding 228 additional APS reports (Table 7). The largest data bias was related to the consistency between the general well-being question and its repeated query (repeating1: 10.6% and repeating2: 23.1%). This result could have been produced by the inaccuracy of the repeated question structure or by citizens misunderstanding the question. The start-check2 mechanism rejected 6.9% of reports, which means that the severity symptoms did not align with the general well-being assessment. The sequence mechanism was the least robust (0.9%), indicating that citizens rarely lled in the form automatically; that is, without carefully reading the answers. A high rejection rate was observed using the QA mechanisms start-check2 (6.9%), cross-validation2 (5.9%), and repeating2 (23.1%), which were between 46% and 57% more robust than their alternative variants (start-check1, cross-validation1, and repeating1, respectively). They rejected a higher percentage of reports. Due to its high rejection rate, start-check2 was found to also reject some consistent reports. Finally, the QA mechanisms were combined into eight QA system variants (QAs 1 − 8 ; Table 8). Implementing each subsequent QA mechanism changed the database, considering the QA system functions. For this study, the most robust QA systems were QAs 8 and QAs 7 (28.6%) and QAs 2 and QAs 5 (27.3%), which best reduced the number of falsely lled reports in the survey. They increased the quality of the collected data, but rejected a percentage of reports that might have contained valid information. As a result, some valuable data would be lost. The least robust were QAs 1 (18.3%) and QAs 3 (20.0%), which could not identify all the reports with false data, thus decreasing the quality and validity of the research results. As mentioned above, two pairs of QAs variants reduced the same data set and replicated the result database (QAs 2 with QAs 5 ; QAs 7 with QAs 8 ), thus allowing their interchangeable use. To reduce replication in the results, the studied QA framework was limited to six QAs variants (with QAs 5 and QAs 7 removed). For another location (i.e., country, continent) and society structure, the robustness ranking of the six QAs variants could be different and the particular QA mechanisms could be more or less effective than in the considered case study in Lublin.
3.3. SDG 11.6.3 as the air pollution impact on citizen well-being indicator The proposed SDG 11.6.3 indicator was calculated for the Lublin case study, for each month of the eld data collection campaign (March-May 2018). Lublin city has a population of 340,000 people, which was the used for the SDG 11.6.3 calculation. The survey monthly statistics and SDG 11.6.3 are presented in Table 9, comparing the results before and after implementing QA s8 , the most robust QA system. According to the signi cance test for a proportion for Lublin city with 5% possible statistical error, the minimum number of reports is 322 per month. In March and April, the minimum number of reports was reached both before and after QA checking.
In May, the minimum number of reports of reports was not achieved for the dataset after QA and, so, this data set was not included in further analyses. The highest total number of reports was collected in March (889), where 350 reports were marked as "no symptoms (no APSs)". This indicates that 61% of surveys in March reported APS. After the QA process, the total number of reports decreased by 26% (660) and the number of reports marked as "no APS" was reduced to 292; thus, the surveys reporting APS increased to 66%. The minimum number of reports was achieved after 15 days of eld data collection. To achieve the same minimum number in the QA s8 -checked database, ve more days were needed. The maximum statistical error for 889 reports, before QA checking, was 3%; while, after QA, it was 4%. The SDG 11.6.3 indicator in March for the reports before QA was 11.58%; while, after QA, it increased to 13.01%. In April, 542 reports were collected, 210 of which were marked as "no APS". QA s8 reduced the total number of reports by 31%, where "no APS" reports were reduced by 11%. Therefore, 50% of QA s8 -checked reports were marked as "APS" in April. A total of 17 days were needed to achieve the minimum number of reports before QA, and eight more days for QA-checked data. The maximum statistical error for this sample was 4% before QA and 5% after QA. The SDG 11.6.3 indicator for April was 11.40% for the data set collected before QA, and increased to 14.67% after QA. This means that the SDG 11.6.3 indicator, based on the data after QA, was higher by 1.55 percentage points in April than in March. In May, the total number of reports reached 392 surveys, of which 194 reports were marked as "no APS". The maximum statistical error was 5% and the SDG 11.6.3 indicator was equal to 14.56%; the highest in the data set before QA. However, as mentioned above, this data set could not be compared to the data after QA as the minimum number of reports was not achieved.

Page 21/40
The implementation of QA s8 increased the SDG 11.6.3 indicator value by 1.43 percentage points for the March 2018 data set and 3.27 percentage points for that of April. It follows that the SDG 11.6.3 value for April 2018 increased by 30% after QA checking and, so, the value of air pollution impact on citizen well-being was signi cantly changed. The SDG 11.6.3 value for March was 1.55 percentage points lower than in April when analysing data after QA. Comparing the SDG 11.6.3 for March, April, and May in the data before QA, the increase of the indicator value is conspicuous, which may have resulted from a decrease in anthropogenic pollutant concentrations in urban air over these months, such that their simultaneous occurrence with pollen had a lower intensity. The citizen scientists engaged in the project collected enough APS reports in March and April to calculate the SDG 11.3.6 value. Comparing these two months, the minimal number of QA s8 -checked reports in March was achieved after 20 days of data collection: in April, this was achieved after 25 days. In May, the collected data after the QA s8 process was not enough for signi cant statistical analysis. It seems that the activity and motivation of citizens became too low in this month. Monthly values of the SDG11.6.3 indicator, as calculated on QA s8 -checked data set, were added to the GeoWeb dashboard app (Fig. 8), such that each user could track the progress of the SDG achievements, presenting the monthly impact of air pollution on their subjective well-being status. This information was presented together with the other APSM results of the whole crowdsourcing campaign, presented in terms of the spatial location of eligible symptom-related layers (Q1-Q8; Figs. 8a, 8b), the number of QA-checked reports (Fig. 8d), the general level of citizen comfort (Fig. 8e), individual symptom severity values (Q2-Q8; Fig. 8f), and the percentage robustness of the QA s8 implemented in the project.

Discussion And Conclusions
In this study, we introduced social innovation into the urban air pollution issue, where citizens act to assess air pollution using their symptoms, thereby extending the paradigm of air pollution. We considered the spatial context; therefore, a map was used to spatially model symptom severity. The web mapping application is public and provides information about air pollution in speci c areas of the city, such that citizens can learn about which areas could positively or negatively impact their well-being, according to information about the severity of the symptoms observed by the APSM project members. The tier 3 SDG indicator (11.6.3) proposed in this study highlights the crucial role of sCS in achieving the SDG 11.6. The value of SDG 11.6.3 is presented in the open web mapping app, such that citizens can observe and compare the percentage indicator of the impact on their well-being in the city within individual months.
Although the potential for using crowdsourced data to monitor urban air pollution was demonstrated here, the minimum report sample can be considered as a limitation of the project. Citizen science is an emerging trend in Poland and, so, speci c motivation mechanisms need to be elaborated, based on the citizen motivation recommendations developed in other projects (Nov et al., 2011, McCrory et al., 2017. As the APSM focuses on citizens who are interested in the effect of air pollution on their health and well-being, our study is wider than other projects which are dedicated only to people diagnosed with health problems. This means that the motivation mechanisms involved differ from those which are appropriate for patients (e.g., free medical consultations).
Crowdsourcing projects rely on a suite of methods to boost data quality and account for data bias (Kosmala et al., 2016). To gain better data quality in CS, three-step mechanisms are generally recommended: taking considerations before, during, and after the data collection process (Wiggins et al., 2011). The APSM method, using logical rules to reject inconsistent database entries, was successfully implemented after the data collection process. Depending on the expected data quality level, different mechanisms were tested. However, by choosing a single QA system (Table 5) and combining rejected reports with the username (nickname and PIN), each user's quality rank can be calculated. A user who passes the QA system could then be rewarded with truth and reputation statuses. Such citizen trust models have been proposed by Alabri and Hunter (2010), who developed a social trust metrics framework, and Langley et al. (2017), who applied a reputation model and used a reputation score system to determine the threshold for accepting volunteered data. This should be based, for each citizen, on the ratio of reports accepted by QA system to the total number of their surveys: the higher the ratio of accepted reports, the higher the level of citizen trust.
In conclusion, APSM data quality mechanisms implemented after the data collecting stage-but referenced to particular user's reports-could be used to develop a user motivation system (which, in the current study, was limited only to user activity). Furthermore, the technological implementation of QA systems as cloud services may enable the ranking of user trust and reputation during the data collection process. Then, not only the quantity but also quality of user reports can be analysed and their level of reputation could be assessed and presented during the campaign. In large-scale CS projects such as iNaturalist (iNaturalist.org), the trust and reputation of citizens is based on their activity: "The users community ensures that data is reliable, but it also gives the opportunity for fellow users to gain knowledge" (Nowak et al., 2020). The APSM project will be further developed with machine learning methods, which will allow us to train the QA models to classify true and false reports and lter them in real-time on the map. Moreover, during the collection process, another solution can be implemented: GPS trajectories. Currently, Q10 (Table 2) requires users to estimate how long they have stayed in a location. Using GPS or GSM data to characterize user mobility patterns and analyse user spatial trajectories could increase data quality and make the application smarter. Changes in user trajectories could also result in an individual push noti cation to maintain or cancel the APS, depending on the change of location. Finally, to gain better data quality, improvements before the data collection stage may also be proposed. The APSM was carried out as a Polish case study. As CS has been recognized as an emerging trend in Poland, we found it necessary to promote the sCS concept through the European CS platform (https://eu-citizen.science) and engage citizens in air pollution monitoring by organizing training sessions. What differentiates CS from other VGI activities is the fact that CS can be taken up by any volunteers who have undertaken standardized training. Learning how to observe one's own symptoms in reaction to air pollution, relating them with air quality information and green pollutant concentration levels, and regular symptom recording were considered prerequisite parameters for ensuring the quality of APSM.
From a practical point of view, the data quality (i.e., completeness, spatial accuracy, thematic resolution, timeliness, and logic consistency) should be suitable for the project purpose. The quality of CS data is expected to be similar to data collected by professionals. According to Wiggins et al. (2011), some general solutions for improving crowdsourced data quality include: volunteer training (workshops), using a large sample size, data lters, data mining algorithms, using a qualifying system, voting for the best, reputation scores, online data and metadata sharing, citizen contribution feedback, reuse of data, and replicate studies; however, the purpose of our study was to create a data quality framework.
This con rms that data quality control mechanisms are an indispensable element in any citizen-driven research, hence also being effective in CS activities such as that considered in this paper. The removal of falsely completed surveys increased collected data quality and usefulness. In our research, only 5.8% of data were eliminated due to positioning accuracy, either lacking geolocation or duplicated reports at the same geolocation. Farman (2015) identi ed 12% of crowdsourced data to have accuracy-related errors. Thus, we conclude that, in our sCS, this type of error did not pose a signi cant problem, in terms of data quality; however, subjective data bias de nitely does. Still, the problem of human bias in data poses a problem which must always be considered during data analysis. Human bias introduced into data can be mitigated by using clearly stated survey questions, additional training, and limiting the scope of the survey. We found that up to 29% of all collected surveys regarding air pollution objectively contained useless or false information. Kosmala et al. (2016) reported subjective data bias at levels between 5% and 35%, depending on the simplicity of the tasks assigned to citizens; Hube et al. (2019) presented data bias at 15-17% in a crowdsourced data set; and Eickhoff (2018) pointed out an accuracy rate reduction by 20% due to cognitive data bias. We recognize that our percentage share of data bias was high, highlighting the absolute necessity of a QA mechanism framework for sCS health-related projects. As QA mechanisms are created through the use of logical rules, they are easy to understand and can be crafted to match particular (expected or observed) error types in collected survey data. We found that QA mechanisms can be used to remove surveys that contain clearly de ned errors.
Out of the ve employed QA mechanisms, the most robust were those aimed at removing inconsistent user answers in the survey (i.e., the 'repeating' QA mechanism). These results suggest that some of the methods employed might lead to a decrease in user engagement, as some users were not consistent with their own answers in the same survey. This nding may be due to a natural phenomenon associated with the human condition or to a survey questionnaire which lacks user engagement. Future surveys employing sCS as a data source might expect many haphazardly completed user surveys. As up to 23% of all collected surveys were marked as containing errors highlighted by the repeating mechanism, at least this kind of QA should be applied to all further works using data collected with the help of CS.
The focus of our research was not on validation with digital sensors, but on eliminating logically inconsistent answers and technologically incorrect objects. To present, no QA mechanism framework had been formulated for APSM projects and, so, the proposed framework could be valuable for a wide group of projects that must manage a speci c data subjectivity type: the subjectivity of human symptoms.
The APSM method can capture the moment when air pollution changes. The observed health symptom severity can be validated with air pollution concentrations measured by air quality monitoring stations. Having information whether citizens are diagnosed pollen allergy sufferers, by collating this information with the current concentration of pollen species, the chance for con rmation of air pollution impact on citizen well-being is higher.
The SDG 11.6.3 proposed in this paper is a new indicator, which proves that citizen science can have a meaningful contribution to the achievement of SDGs. Citizens, together with scientists, built a reliable model of the impact of air pollution impact on the well-being of citizens in their city. According to Fritz et al. (2019), SDG tier 3 indicators have high potential for future contributions to citizen science, where methodologies are still being developed or data gaps occur. However, they also pointed out that data quality is one of the greatest limitations in this area and, so, quality assurance mechanisms are a crucial challenge for obtaining CS data which can readily contribute to the achievement of SDGs. As a result of our research. we can con rm that not only QA mechanisms, but also citizen activity is necessary for CS contribution in SDG achievement. Despite crowdsourcing solutions being popular at present, we had di culties in collecting the minimal number of reports for a statistically signi cant analysis. The results showed the decreasing activity of citizens, which led to not enough data to con dently calculate the SDG 11.6.3 value in the last month of the crowdsourcing campaign (May 2018). A total of 74 citizens participated in our study and, although they were invited to report their APS observations preferably once per day, there were not enough reports in May to reach the minimum number. We conclude that the group of citizens was too small and had limited motivation.
In the study, we applied a user activity rank model, which showed that citizen activity decreased over time; which is typical for a CS project (Geoghegan et al., 2016). The level of citizen engagement and motivation was the highest at the beginning of the crowdsourcing campaign (100 reports per day), dropping after 14 days. The two peaks in the last month of eld data collecting campaign could have resulted from the motivational workshops with an educator where citizens rankings were discussed, thus increasing competition between the citizens (35 and 40 reports per day, May 2018). In summary, for a case study of Lublin or any city of similar size and structure, a citizen group larger than 74 people is needed and regular workshops are necessary to maintain their activity.
Due to the intuitive access and operation of the presented tools, such methods and tools are suitable for scientists, educators, and evaluators alike. The ability to reduce data bias in real-time is not possible without a programmed web-based mechanism functionality. AGOL con gurable capabilities allow for data ltering, but the lters are too basic for the conditional statement combinations which form the QA mechanisms.
Conveniently, our database was set up on REST services, such that the QA mechanisms and their combinations could be implemented and analysed using desktop software, in direct connection with AGOL apps, which ensured the stability of the REST service-based data source. APSM modelling directly focused on the urban air pollution information shared with society and, as a result, represented the level of citizen well-being. Due to the proposed tier 3 SDG indicator, the data obtained with regards to the measured air pollution could be output as a spatial model of city well-being, which is crucial for SDG 11.6.
The Acronym List Cross-validation mechanism in variant 1 and variant 2 The cross-validation mechanism was used to reject responses using three essentially related questions.
Mechanism studied and proposed in two variants of robustness Repeating mechanism in variant 1 and variant 2 The repeating mechanism determines the quality of the report, according to the other previously asked questions, by asking the same question but in a different way. Start-Check mechanism in variant 1 and variant 2 The start-check mechanism is used to verify the report quality at the beginning of the survey and controls the report quality during the analysis of each symptom severity answer.  Figure 1 GeoWeb architecture implemented for the APSM project: A) mobile survey app for eld data collection; B) web app with educational and training materials; C) dashboard app for monitoring data collection process and presenting APSM results; and D) quality assurance module.  Educational part of application: cascade story map.  Time slider app following the data collection process over time.

Figure 7
Citizen activity curves during the eld data collection campaign. Web mapping application, presenting results of QA-checked data using QAs8, operations dashboard: a. Map; b. Layer list; c. Legend; d. Indicator of QA-checked reports; e. Bar chart of citizen general comfort; f. Pie chart of citizen symptom severity; g. Pie chart of QA system robustness; and h. Proposed tier 3 SDG 11.6.3 indicator.