The dataset gathers official and public information from the “e-Gestor AB” platform of the Ministry of Health of Brazil (MoH)[10] and restricted data obtained by the Brazilian Access to Information Law (LAI). We manually extracted the public data on December 06, 2021. It includes the monthly count of PHC teams and their absolute and relative population coverage estimates. The restricted data provided monthly information on the MDP implementation and physician counts. We add previously consolidated spatial, demographic, and socioeconomic data, which are described and available elsewhere[9]. The final dataset has 1,509,870 observations and 35 attributes aggregated by months from 1998 to 2020 and policy-relevant geographic units (country, macroregions, states, municipalities, and capitals). We automated all data processing/curation in the free and open software R. The codes can be audited, replicated, and reused to produce alternative analyses.
Table 1 provides an overview of the files and datasets available in Synapse[11]. Data files 1-2 hold the codes for ingestion, transformation, and loading routines. Dataset 1 includes PHC's raw files, and datasets 2-3 comprise workflow endpoints with the PHC and MDP data. Data file 3 builds the final dataset, which was integrated, harmonized, and enriched with spatial, demographic, and socioeconomic data[9]. The HTML files show type-specific information for intermediate and final datasets attributes, including statistical summaries and missing frequencies (data files 4-6). Data file 7 documents the metadata and attribute descriptions of the final dataset (dataset 4).
Table 1: Overview of data files/data sets.
Label
|
Name of data file/data set
|
File types
(file extension)
|
Data repository and identifier
|
Data file 1
|
script_phc_ingestion
|
R code (.r)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Data file 2
|
script_pmm_ingestion
|
R code (.r)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Data file 3
|
script_master_phc
|
R code (.r)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Data file 4
|
phc_sprint_dataselfie
|
HTML (.html)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Data file 5
|
pmm_sprint_dataselfie
|
HTML (.html)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Data file 6
|
phc_master_dataselfie
|
HTML (.html)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Data file 7
|
phc_master_overview
|
excel (.xlsx)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Dataset 1
|
phc_data_raw
|
zipped (.zip)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Dataset 2
|
phc_clean_data
|
R data (.rdata)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Dataset 3
|
pmm_clean_data
|
R data (.rdata)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Dataset 4
|
phc_master_data
|
R data (.rdata)
|
Synapse: https://doi.org/10.7303/syn26529247 [11]
|
Anyone can browse the content on the Synapse website, but you must register for an account using your email address to download the files and datasets.
Data construction
The data workflow involves two main steps. The first step covered ingestion, transformation, and loading routines of PHC and MDP data. We downloaded three raw files from the “e-Gestor AB” platform with the number of PHC teams by municipalities and months/years and received one raw file via LAI with the MDP physician’s names and activity locations and periods. The essential features of data transformation were (i) variables selection/renaming and observations filtering, (ii) correction of codes and names identifying geographic units, (iii) cleansing numeric values, e.g., excluding special characters, (iv) cleansing inconsistent dates, e.g., exchanging start/end activities dates, (v) converting MDP individual data to ecological data and flagging municipalities with the implemented program, and (vi) enrichment of the municipal datasets with data aggregated by states, macroregions, and country. This step produced two datasets treated and usable in the final dataset construction.
The second step comprises data integration, harmonization, and enrichment with spatial, demographic, and socioeconomic characteristics[9]. We combined the treated datasets according to the months/years and codes of geographic units. The PHC’s absolute coverage estimates followed the MoH method: number of FHT × 3,450 + (number of pBHT + number of eBHT) × 3,000. Besides, the PHC’s relative coverage estimates considered the total population size in the current year. The absolute estimates will equal the population size whenever the numerator value equals or exceeds the denominator value; in turn, the maximum values of the relative estimates were 100%. Remarkably, the PHC’s indicators do not include the physicians from the MDP. R codes and data processing/curation were peer-reviewed, and their results compared to the official site’s information.