A Rational Framework for Development Iranian Twin Registry: Lessons Learned from Developed Twin Registries

Background: Regarding technology advancement in health sciences and the possibility of recording and organizing a huge amount of data, registries in various clinical domains have been established. The specic genetic structure of twins has enabled researchers to nd answers to role of genetics and environment in medical sciences. Thus, twin registries were developed across the world to support twin studies. The main objective of this study is devising a conceptual model for developing the Iranian twin registry to ensure the success of this registry. Methods: In this descriptive and qualitative study, the combination of literature review, brainstorming, and focus group discussions were applied to develop a conceptual model of the twin registry. Based on our qualitative and thematic analysis, the workow of information gathering and different data levels are described in this study. Results: According to the characteristics of established worldwide twin registries, the success factors in registry implementation were recognized. Moreover, based on our objectives, the hierarchical conceptual model of the registry was invented. The source of information, the different levels of data, and the information ow were determined based on this model. The workow modeling of data gathering was represented through this survey to clarify the data collection and saving processes. Conclusion: Suggesting a conceptual model and standard framework for twin registry development at the national level based on the experiences of other countries could guide researchers in twin registries implementation eciently. Moreover, the next steps of implementing a twin registry in a standardized and stepwise way are outlined in this project.


Background:
Twin studies provide valuable means for clinical researches to study a wide range of health topics [1][2][3][4]. The speci c genetic structure of twins enabled researchers to nd the answer to some puzzling questions in medical sciences by studying the impact of genetic and environmental factors in different domains of medicine using the health-related data of twins and their long-term clinical follow-up [5,6]. It is necessary to conduct a large clinical trial and longitudinal cohort studies to achieve such valuable results [7,8]. With the advancement of healthcare technology and the possibility of using large data repositories, clinical registries have been invented to overcome the challenges of traditional clinical studies [9,10].
Clinical registries provide the opportunity for researchers to aggregate health-related data from different resources into a large database to support intelligent data analysis and mass screening [11,12]. They are known as an integrated system that gathers information using structured minimum data sets from different sources [13]. By equipping registries with knowledge discovery capabilities, researchers can extract valuable information and even generate new assumptions in different elds of medical science [2,14]. Twin study is one of the disciplines in which creating registry has been so bene cial [15].
In the last three decades, the national twin registries have been founded in developed countries to create a national database containing primary and secondary data related to identical (monozygotic) and nonidentical (dizygotic) twins [8,16]. In addition to other bene ts, organizing and recording information of twins would be decreased the bias of the population-based survey [4,7,17,18].
Developing a twin registry at the national level required outreach efforts. Indeed, developing a twin registry encountered many challenges in the context of designing, gathering information, registration, follow-up, and implementation [15,16,19].
One of the most important challenges in registering twins is the di culty of collecting twin's health data.
Because the twin's information cannot be accessed through health information systems [19]. The next challenge is related to the complicated nature of developing any national registries. Morsey et al. [20] highlighted the challenging nature of developing health registries in their study. They proposed that the initial steps for developing the national registry are de ning the proper strategic plan for the whole process and designing the conceptual model for information ow. Lack of attention to these issues led many countries to fail in their efforts to develop national registries. So, main objectives of this study are: (1) de ning a strategic plan for twin registry implementation, (2) devising a conceptual model for developing the Iranian twin registry, and (3) generating a suitable work ow process to ensure the success of this registry.

Methods:
In this descriptive and qualitative study, the combination of literature review, brainstorming, and focus group discussions were undertaken to answer our research questions. The applied methodology can be described in four phases. These four steps are (1) literature review, (2) focus group (FG) meeting to gather expert opinions, (3) thematic analysis and synthesis of the results, (4) summarizing the results and generating the models.

Literature review:
In the rst phase, the literature review was conducted based on the identi cation and reviewing of the challenges, pitfalls, and achievements of twin registries throughout the world. Scienti c database searching was performed using PubMed, Scopus, Web of Science, BMJ, and ScienceDirect. The search strategy included keywords such as "Twin registry", "Twin registry studies", "National twin registry", "Twin registry outcomes", "Twin studies in the world", "Twin registries". The o cial websites of developed twin registries were also searched to identify their characteristics.
All founded registries were explored in terms of the country, type of study, primary goal, population, their method for information gathering, biobank samples, population, kind of registries, and outcomes. Our criteria were limited only to articles regarding the designing of twin registries and their characteristics.
Focus group meeting to gather expert opinions Following the literature review, a FG technique was employed to discuss the foremost requirements for Iranian national twin registry implementation. The experts were chosen by purposive sampling method based on expertise and the objective of the study. Then, the researcher (MG) contacted them by phone and email and explained the objective of project, and asked for their participation. Out of 15 experts, ten experts agreed to participate in our FG discussions.
The date of the meetings speci ed according to the expert's wishes at the Tehran University of Medical Sciences. In these meetings, the participants were asked about their agreement on essential requirements and characteristics of the Iranian twin registry.
Based on the literature review results, different levels of information, prede ned categories, our objectives of conducting this research was explained thoroughly at the beginning of the meetings. Then, the main requirements and qualitative analysis were discussed in the FG discussion meetings. The FG discussions allowed the investigators to gather expert opinions and discuss the research questions face to face. The FGs typically lasted 60-90 minutes, although the length of meetings varied according to topics and the participant's contribution during each session of the discussions. Before each discussions, the objective of our study was explained to participants, and verbal consent was obtained. One of the researchers (M.G) as a Ph.D. student who had experience in designing health information systems and medical researches took notes to capture the signi cant points of discussion. Transcripts were checked by other researchers at the end of sessions. Key points and quotes from each FG were obtained by transcription. FGs were continued until to reach data saturation. Thus, ve FG discussions proved to be su cient.
Thematic analysis and synthesis of the focus group results Thematic analysis as a qualitative analysis method was applied to analyze the FG results and literature review ndings. In this study, thematic analysis was conducted based on the method that was introduced by Braun and Clarke [21]. Atlas ti V.6 software used to manage all of the transcriptions. First, one of the researchers recheck transcriptions and FG data. Then, all of the FG data were imported into the software. Secondly, open codes assigned to notes with interpretive notes by two researchers independently. Third, all of the data were combined and the coding list was generated. Based on the repeated codes, searching for main themes was conducted to identify themes. Potential themes were reviewed by all authors to ensure their repetition. Finally, a thematic map was designed to generate an emergent theme and subtheme based on Braun and Clarke suggestion.
Summarizing the results and generating the models In the third phase, all of the ndings were categorized and summarized by a combination of literature review ndings with expert panel results. Based on the main objectives of our study, the main stakeholders, the information resource, and the work ow of information gathering were determined. All of the obtained information was classi ed and analyzed using SPSS. Following literature review and ve FG discussions, the draft of the expected speci cations was produced. Subsequently, the rst version of the conceptual model of Iranian twin registry was produced and re ned based on the expert's view.
After model development, two FG discussions were held to validate the reliability and accuracy of designed models. Additionally, the methodology and various steps of developing a twin registry were established through this phase.

Results:
Expert panel: The multidisciplinary expert team included ten experts comprises of one internist, one pharmacotherapist, three epidemiologists, two pediatricians, two midwives, and one medical informatics specialist. All expert panel members participated in FG sessions. The demographic characteristics of all members represent in Table 1. Characteristic of established twin registries based on literature review: More than 2500 studies were retrieved in searching the scienti c databases phase. After duplication removal, 2300 articles remained. In the title and abstract screening phase, approximately 530 articles remained. Finally, 39 studies were selected as relevant articles through full-text screening based on the main objective and research questions of our study. Additionally, all of the information about established twin registries was extracted from gray papers and their o cial websites. All of the retrieved studies were analyzed and classi ed based on prede ned categories.
According to reviewed articles, 32 twin registries at the national or state level were determined across the world. The investigation disclosed that most of them were founded in developed countries. Due to the high potential of twin registries as the statistical and epidemiological research tool, the rate of constructing twin registries are increased in the last decades.
Out of 32 recognized registries, almost 76% of found registries are national twin registries, and 7% of them are provincial. The rest of the registries have belonged to non-governmental organizations.
Regarding the type of study, 79% of them applied a longitudinal design. The oldest twin registry is the twin registry of Denmark (DTR) with more than 86 years of history [22]. Some of the twin registries are classi ed as volunteer registries. Australian Twin Registry (ATR) as the largest volunteer twin registry successfully registered 17% of identical and non-identical twins of Australia [23].
Today, 68% of twin registries equipped with Biobanks. They use different samples from twins based on their objective. The analysis of samples based on the type of sample shows in Table 2. All of the registries assessed the zygosity in different methods. Those techniques include employing a standard questionnaire, biological markers, or mixed methods of standard questionnaires and biological markers. Some of them applied decision tree rules in questionnaire-based methods. One of the most famous questionnaires used in twin studies is known as "peas in pod questionnaire". The survey showed that developed countries such as Sweden [24], the Netherlands [25], England [26], Finland [27], and Australia [23] founded their national twin registries to provide a reliable platform to enhance longitudinal twin studies. Concerning outcomes, the twin registry studies cover a wide range of clinical and non-clinical disciplines. They reached novel ndings in various health topics such as nding new methods in the diagnosis of cancer, discovering causes of premature deaths, studying on fetal abnormalities, and determining underlying hereditary factors in chronic diseases, the e cacy of new drugs, and identifying risk factors of behavioral disorders [3,8,15,[28][29][30]. The development of a twin registry in some countries also led to the publication of more than fty research articles annually. All the results obtained from the analysis of the characteristics of the developed registers were summarized in a table by authors. Then, the results were visualized as a network of the most frequently used words in Fig. 1.
The establishment of twin registries and conducting more studies in developing countries are needed for identifying speci c genetic and environmental variables that contribute to health and disease. Thus, developing the Iranian twin registry would be highly valuable and bene cial in Asia.
The analysis of the reviewed studies revealed that the most important issues in the eld of development and implementation of twin registers are the standardization of data types, information ow, data collection methods, storage methods, data analysis, required reports, and data linkage. All of the studies emphasized that data gathering should be done based on the standard work ow to enhance accuracy and outcomes.
In terms of software or platform, most of the registries did not refer to the type and description of the platform employed to collect twins' data. Thus, the researchers contact some registries to check the existence of these platforms via email or phone. According to studies and available results, about 65% of registries used information management software to organize their information. Forty-ve percent of institutions developed these systems by themselves in their institution, and the rest of the registries ordered their systems to software companies as needed. None of the existing registries stated that they used downloadable applications or open-source software because they believed that they did not meet their requirements.
Thematic analysis and discussion ndings: Different views and opinions represented regarding the design and implementation of the twin registry at the national level in FG discussions. The participants pointed to obstacles, opportunities, requirements, and challenges to achieve successful implementation. All topics were organized by key themes by thematic analysis. When no new aspects emerged, data saturation occurred, and the last group discussion took place. A total of ve main themes were extracted from the thematic analysis of the FG discussions. Table 3 shows these main themes along with quotations from each topic. Lack of a proper strategy plan Expert 1: "To achieve our main objectives, the standard strategic plan should be de ned" Expert 2: "It is not possible to start a long-term study at the national level without a codi ed program" 2 Various data sources Expert 1: "Because twin information must be gathered from a variety of sources, it must be determined before studying." Expert 2: "Different levels of information and data at the national level and how different levels of data communicate before data collection should be de ned" 3 De ne classi cation for twins Expert 1: "Since twin information will be collected from different age and social groups across the country with different characteristics, proper information classi cation in different age groups can help to implement the registry more successfully." 4 Information work ow Expert 1: "The data collection process should be de ned clearly. So that a team of different professionals can accumulate data according to a standard plan." Expert 2: "If there is no speci c plan, everyone can collect data in their own way and it might cause data redundancy." Expert 3: "Without a standard work ow, invalid data might be entered into the database. Therefore, a proper solution must be considered to prevent the entry of invalid data."

Conceptual model
Expert 1: "Since there is no speci c conceptual model for twin registers in the world, devising a conceptual model can overcome the obstacles and future barriers to registry implementation, and it could ensure the success of this system." Expert 2: "Developing a conceptual model could be considered as a roadmap for future planning." 6 Data dictionary Expert 1: "A set of information that should be recorded in the system is one of the important requirements of any registry." Expert 2: "According to the initial aims of the registry, the required data in twin researches should be determined and explained." Expert 3: "The required data must be approved by experts." Based on these extracted themes and lessons learned from founded registries, the steering committee planned to design some rational models for creating a population-based twin registry with accessibility to twin's health data. Consequently, Consequently, the suggested models to saturate the requirements are described in the following. However, data dictionary development and determining minimum data sets not discussed in this article, and it needs further study.
Developing a twin registry strategic plan: The examination of established twin registries throughout the world helped us to determine the key features of a twin registry. Alongside, the feasibility and the strategic plan of developing a twin registry in Iran were discussed by researchers. All of the processes that should be performed for twin registry development are represented in Fig. 2. This strategic plan was established based on the research objectives and experiences from published studies. Establishment standards strategy plan can ensure the researchers to implement the effective, standard, and usable national registry.
Designing the framework for a different level of data The survey indicated that one of the most important challenges in twin registries is data gathering. Without the standard structure of data sets, gathering information will be useless. Three proposed main categories for gathering data in Iran are twin newborns, school-aged twins, and adults. This classi cation facilitates the gathering, storing, and analyzing twin's data. Moreover, the source of data can be de ned easier.
Based on the analysis of other twin registries' achievements and FG discussions, the suggested framework was designed for the Iranian twin registry. The proposed hierarchical model of different data levels in the twin registry is shown in Fig. 3.
The data of twins can be obtained from three levels of data. In the lowest level, raw data can be collected from distributed resources which include all of the studies and researches have been conducted in twins.
Aggregating these raw data is very di cult because they did not have the same structure. Additionally, the distributed nature of these kinds of data is another challenge. If such information is to be used, it must be cleaned and then aggregated according to the standard data format of the registry.
In the next level, twin's data could be obtained from different platforms. Data at this level have a more speci c and de ned structure based on their origin. Data can be collected directly by a voluntary collaboration of twins in different age groups through electronic or paper forms prepared by twin registry executives, or through governmental systems at the national level. These governmental and clinical resources include national infant registry, hospital systems, and ministry of education systems for gathering school-aged twins' data. The existence of a student information system in Iran can facilitate the collection of school-aged twins' data.
In the third level, after the implementation of the twin registry system, the information should be entered manually by the trained registrars in different sections based on different age groups. The data entered in this section mainly includes all of the data related to tracking twins after entering the registry. Finally, at higher levels, data is processed by aggregating data from different levels and provided to managers and researchers as information in the form of various reports for further researches.
Finally, if the system is equipped with intelligent tools such as data mining tools or arti cial intelligence, knowledge can be achieved by automatic analysis of information. This knowledge is very effective in creating a hypothesis and achieving valuable results.
The proposed framework in this section is based on the study of established twin registers around the world and the analysis of the collected opinions of experts who are interested in twin studies.
Recognizing the different levels of data can assist researchers to identify the appropriate data sources.
Designing a work ow model of information gathering Reviewing the retrieved studies indicated that different methods were employed by twin registries in terms of data collection and saving data. These methods vary from country to country. It seems that the applied method depends on several factors, such as infrastructure, facilities, government policies, scope, and the main purpose of the twin registry.
The information ow for data gathering in the Iranian twin registry was designed based on our objectives, analysis of retrieved studies, and expert consensus to prevent irrelevant processes.
The optimized model was approved after an agreement between experts through several sessions of discussion and based on their feedback. The nal model is represented in Fig. 4.
As it is apparent, the rst step in adding twins' information to the twin registry is saving the essential and primary information. Next, after verifying the accuracy of primary information, the complementary data of twins will be added to the system database. This information can be synchronized with existing database information and added into the twin registry database. If the registry is equipped with Biobank, the biogenetic data must be linked to their original data based on the speci ed code of each twin.
Furthermore, we nd out the mother's information could be the rst step to classify twins. So, the mother's information was considered as a primary key in our database. Thus, all of the twin's data would be categorized based on mothers' data to track twins. All information de ned by the three age groups can be categorized based on the mother's information of each twin. This type of categorization makes a comparison of identical and non-identical twin's data easier. Discussion: Due to the importance of twin studies, founded twin registries around the world were examined from the view of medical informatics. The results revealed that national twin registries are not solely restricted to a single database or data repository. Indeed, they de ned as a systematic process that comprises of setting research committees, creating structured minimum datasets, designing databases, and gathering healthrelated information. Besides, we found that registry implementation and digitizing all twins' data at the national level could facilitate managing, analyzing, retrieving, and manipulating data for twin studies [16,28]. Though conceptual frameworks have the potential to be used to describe the ow of information and how to manage data, so far, no speci c logical model has been proposed for the management and organizing of twin information at the national level. The authors believed that research gaps in organizing twins 'health data and implementing a national twin registry have been lled by providing a logical model. Hence, the standard framework for collecting and recoding Iranian twin's health data was de ned based on the combination of literature review and expert consultation.
Due to the geographical spread of the twin population and scattered data sources, data collection is one of the most challenges that all of the registries encountered. At the national level, population data sources are usually much broader. Therefore, researchers concluded that de ning a suitable framework for a different level of data is essential to overcome the challenges of recording the twin's data.
To overcome the di culty of gathering twin's data from distributed sources, conducting a parallel multidisciplinary group study, and placing twins in three different classi cations is a good idea. This classi cation makes it possible to access information and collect data in three different groups in parallel. To our knowledge, this solution has not been used by other twin registries.
The suggested framework (Fig. 3) can be considered as a hierarchical model that was consistent with the information pyramid. As it is obvious, raw data is collected at the lowest level of the pyramid, and they transform into knowledge through the identi ed processes. Such a knowledge can be useful for decisionmakers and health policymakers. Furthermore, considering the mother's data as the primary key could be overcome the challenge of linking twin's data. To enhance accessibility, saving the twins' data based on the mother's information is the best strategy. This idea can be made designing the database easier too.
Regarding a platform, studies suggested that a web-based platform could be the best solution. To ensuring creating a reliable platform, registry system should be designed based on the speci c requirements of each country according to its facilities, knowledge, infrastructure, and budget. Thus, it will vary the country from other countries.
In summary, we integrate all of the ndings and results in overall model of Iranian twin registry [ Fig. 5].
Since the main objective of a twin registry is improving twin studies, we considered the follow-up section to gather relevant health-related data continuously.
Without de ning this reasonable work ow, organizing twin's data at the national level is challenging. The proposed models for the Iranian twin registry cover all twin pairs in the country with different age groups, education levels, and social status. This rational work ow was de ned based on comparative studies to facilitate data gathering. One of the signi cant bene ts of de ning rational work ow is underlined in its ability to prevent entering invalid data. According to the standard work ow, data will not be recorded in the database until the demographic data are veri ed. Since follow-up and surveillance are so important, a special health care plan was developed to improve the health of the enrolled twins.
Through this survey, we also faced some limitations regarding access to information. Some registries have not been published their accomplishments or characteristics in the form of a scienti c article or published resources. Hence, our results were completed by referring to their website, sending an email, contacting twin registries, and other unpublished resources. We should mention that the overall framework is based on the views of Iranian experts and their objectives, and it may not be generalizable to other studies.

Conclusion:
Taken together, the model proposed in this study, in addition to providing a standard method for collecting and organizing twin's data, can be considered as the rst step in setting up a twin registry.
Since other studies suggested that the best way to create a reliable platform is designing a system based on the requirement, it should also be a conceptual model for software development. However, the overall framework is based on the views of Iranian experts and their objectives, and it may not be generalizable to other studies.
Abbreviations FG: Focus group Declarations Ethics approval and consent to participate: The study protocol was reviewed and approved by the ethical committee at the Tehran University of medical sciences (IR.TUMS.VCR.REC.1398.128). Informed verbal consent was obtained from participants after been informed about the details of the study before focus groups and interviews.

Consent for publication
No individual details, images, or videos are included such that consent to publish is not applicable.