An Intelligent Prenatal Screening System for the Prediction of Trisomy-21

Objective: Early and accurate diagnosis of genetic diseases such as Down syndrome can lead to correct action and prevent irreversible events. Therefore, the use of accurate and low-risk diagnostic methods to detect these diseases in the fetal period is important. In this study, a combined articial neural network (ANN) and genetic algorithm (GA) was used to predict Down syndrome through rst trimester screening test. In order to examine the proposed model, sample data were collected on 381 pregnant women who referred to a private screening reference laboratory for the rst trimester screening test and NT ultrasound between 11 and 13 weeks of gestation. The proposed model in this study was a feedforward neural which its structure and input parameters were determined by (GA). Results: The average of 10 times experiments showed that the developed model can accurately identify cases of Down syndrome with specicity of 99.72% and sensitivity of 90.91%, and a mean square error (MSE) of 0.61%. The results of this study showed that the use of GA in optimizing the structure of the neural network technique can increase the accuracy in diagnosing Down syndrome through the information of rst trimester screening tests.

(MSE) of 0.61%. The results of this study showed that the use of GA in optimizing the structure of the neural network technique can increase the accuracy in diagnosing Down syndrome through the information of rst trimester screening tests.

1-Introduction
Down syndrome is caused by the presence of a trisomy on chromosome 21 of humans and with prevalence rate of approximately 1 in every 800 births, which has the largest share of genetic abnormalities (1). Children with this disease often have varying degrees of physical-mental problems such as mental retardation, communication skills problems, congenital heart disease, and thyroid disease (2,3).
Nowadays, it is possible to diagnose Down syndrome in the early stages of pregnancy by performing various tests. Routine tests that are usually done to diagnose such problems are done at rst trimester screening tests. Although this test is known as a non-invasive, low-risk procedure and can only be performed using a maternal blood test and an ultrasound of the fetus, it has low accuracy (at best with 90% accuracy) and is only able to assess the likelihood of disease occurrence (4,5).
Another non-invasive method is the measurement of free DNA in the mother's plasma (Cell free DNA OR non-invasive prenatal test (NIPT)), which is done by examining the DNA released from placental cells, which can be extracted from the mother's blood. The advantages of this method are its low risk and high accuracy, but this test requires special equipment and its high cost, delay in answering the test and the di culty of isolating fetal DNA from the mother's blood are some of its disadvantages (6,7).
Amniocentesis and Chorionic villus sampling (CVS) are two invasive procedures that are performed using amniotic uid and fetal villi sampling, respectively. These two methods, despite their high accuracy (98-99%), increase the risk of miscarriage (1-2%) or harm to the mother, and also because of its high cost and fear of postpartum consequences for the baby, performing this test is not considered reasonable for all pregnant mothers and is only recommended for those who are at risk based on screening test results (5,8,9).
ANNs, as one of the branches of machine learning, are a collection of intelligent computer calculations that have been widely used in the principles of classi cation and prediction for the last 25 years, especially in the eld of medicine. These networks are in fact mathematical algorithms that are trained by data and their knowledge is obtained by the relationships between data. These networks, if properly trained, can act like the human brain and, in some cases, identify complex nonlinear relationships between dependent and independent variables that are not detectable by the human brain (8,10).
GA is an evolutionary algorithm that is widely used in the optimization of machine learning systems and can provide an almost optimal solution with random search technique (11,12). This algorithm works by modeling chromosome recombination capabilities in a process similar to what occurs in the meiotic stage of cell proliferation. An initial set of solutions, called a population, is created randomly to solve a problem. Each solution is considered as a chromosome. Chromosomes create a new generation each time the recombination and cycle is repeated. Chromosomes whose phenotypes perform best in the neural network are considered to be the parents of the next generation. After several generations, the population converges to the best chromosome, which can provide the desired architecture of the neural network (13,14).
In this study, an intelligent model was achieved by using neural network technique and GA optimization capacities, which was obtained by examining various factors with higher accuracy and less risk in diagnosing screening tests.

-1 Dataset Collection
In order to create the required data, information about 381 pregnant women were collected, who referred to a private screening laboratory in Iran, Ahvaz, for the rst trimester screening test and NT ultrasound between 11 and 13 weeks of pregnancy between February 20, 2018 and February 20, 2019.

2-2 Hybrid ANN-GA
Usually a neural network consists of an input layer, an output layer, and one or more hidden (middle) layers. The input layer contains raw information that is given to the neural network for processing. The initial values in the input layer are received by the middle layer. The middle layer changes the received values by a series of weights and sends the new values to the next layer. Finally, the output layer processes the received information and produces the output (15). The neural-genetic network was designed by MATLAB software version 2015a. The ANN used in this study is the Feedforward backpropagation (BP) algorithm. Feedforward BP networks are networks that include an output layer, an input layer, and at least one layer of processing neurons (middle layer) in which one-way information ows from the input layer neurons to the output layer neurons. In other words, the output of each layer becomes the input of the next layer (16,17). The ANN used in this study has an input layer, two hidden layers and an output layer with a neuron. The number of input layer neuron (s) is from 1 to 7 neurons and is equivalent to the study factors. the number of intermediate layer neuron (s) is from 1 to 20 neurons determined by the GA. The transfer functions of each layer are also selected by the GA. For the training process, 12 types of training functions and 2 types of learning functions are used to select by the GA. The output of the ANN indicates healthy cases from the cases of Trisomy 21 (T21).

2-3 Genetic Algorithm
The parameters that organize the ANN structure and the input factors were de ned by haploid chromosomes containing 14 genes. The rst two genes on each chromosome were related to the number of neurons in the rst and second hidden layers with the values of 1 to 20. The third to fth genes also determined the transfer function in the rst and second layers as well as the output layer, which could accept values from 1 to 3. The sixth gene was related to the selection of the training function, which could contain values from 1 to 12.
The seventh gene also determined the learning function, which accepted values 1 and 2. The eighth to fourteenth genes, each indicating the presence or absence of one of the seven ANN input factors. Figure 1 shows the structure of the GA and its operation to optimize the ANN prediction performance in the hybrid model.
In order to create the initial population, chromosomes were created randomly. After evaluating the expression of each chromosome, the superior chromosomes were selected for the recombination process and the next generation were created and paired in pairs. Chromosomes were considered as higher-score which their decoding resulted in the networks with lower MSE and greater accuracy. With a certain number of iterations of the generation cycle, the ANN with the lowest MSE was nally determined as the nal optimal network.

2-4 Data analysis
The performance of the generated model was evaluated by three indicators; sensitivity, speci city and accuracy, which are calculated using confusion matrix data (18)(19)(20).

-1 Study Data
Out of 381 patients, 73 were excluded from the study due to lack of information, and nally 308 cases were included in the study, of which 286 were healthy and the remaining 22 had Down syndrome.

3-2 Designed model
In order to train the designed model, 80% of the data were selected, randomly and the remaining 20% of the data were allocated for the model testing. The population of each generation was 120 chromosomes in the GA. The probability ratios of one-point intersection and two-point intersection were 0.4 and 0.6, respectively, and in 0.8 of the total population, the intersection process took place. The mutation process also occurred randomly in 0.6 populations. The remaining 0.2 to create the next generation's population was also selected from the chromosomes of each population as well as mutants that performed better.

3-3 Classi cation results
After 10 times of performing the hybrid model of ANN-GA using the training dataset, this model was able to identify Down syndrome cases with the average accuracy of 96.32% and speci city and sensitivity of 99.72% and 90.91%, respectively, and with MSE equal to 0.61% in All data and with 82.27% accuracy as well as 98.62% and 75% speci city and sensitivity, respectively, and with MSE equal to 2.17% among the testing dataset. Sensitivity, speci city, accuracy and MSE related to 10 times the implementation of the hybrid ANN model and GA are listed in Table 1.

4-Discussion
In recent years, machine learning techniques and especially ANNs in the eld of diagnosis of fetal abnormalities have been provided robust results (19,21,22). But what can accurately demonstrate the capacity of a ANN for early detection of a disease is the optimal ANN architecture (13). This study showed that using GA, an optimal ANN architecture can be achieved for early detection of Down syndrome. In recent years, similar studies have been performed to detect aneuploidy abnormalities through ANNs, the results of which are shown in Table 2. Usually the screening tests are interpreted by software due to the ambiguous nature of them and the mother's age and weight are also used to improve the interpretation. Since Down Syndrome affects the Page 7/10 fetus's appearance, NT sonography also has a signi cant effect on the interpretation of such tests (25).
The hybrid model can provide a clear, simple, fast and acceptable interpretation of this tests.

5-Conclusion
The results of this study showed that the use of GAs in optimizing the structure of the ANN can increase the accuracy in diagnosing Down syndrome through the information of rst trimester screening tests. GA and its search strategies improve the results of these networks by nding the optimal structures of ANNs.
The results showed that improving ANN conditions using GA is more e cient and cost-effective than manual methods.

6-Limitation
The main limitation of this study was the limited number of abnormal cases. It is possible to expect more de nitive results from the model if there were more abnormality cases.

List Of Abbreviations
List of abbreviations used in article

Consent for publication
In order to preserve the privacy and Con dentiality of individuals whose information is used in this research, personal information wasn't available to the researchers and only laboratory, clinical and social information was accessible.
Availability of data and materials The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests Funding this paper was nancially supported by grant from vice chancellor of Research Affairs of Ahvaz Jundishapur University of Medical Sciences. This funding source had no role in the design of this study and will not have any role during its execution, analyses, interpretation of the data, or decision to submit results.

Authors' contributions
The author's responsibilities were as follows: AM suggested the study, AM and SMH designed the study and conducted independent literature searches, SMH extracted the data, AM and SMH performed the statistical analysis, interpreting the ndings, and wrote the manuscript.JM-A and MM helped in study designing and interpreting the ndings. All four authors read and approved the nal manuscript.