AI approach for Age and Sex estimation
Sex and age estimation are the basic step of forensic identification. Sex estimation excludes half of the population [13]. Similarly, estimation of age at time of causalities, earthquakes, genocide, mass-disasters are of great importance. Although very limited studies confirm the use of AI and ML in forensic, the selected studies in this review shown the efficiency and accuracy in predicting sex and age faster than radiologists hence decreasing the reporting time [14].
For this systematic review article, we have analyzed the studies that use AI based algorithms in forensic perspective. Different training and test modules were generated with help of radiology modalities in these studies which proved to be superior to human eyes and free from human biasness.
4.1 Age estimation
Age estimation for forensic and legal matters can be done by using established methods of radiology which are based on visual examination of bone ossifications, skull sutures, pubic symphysis, calcification, skeletal maturity and degenerative changes [15]. Bone age (BA) is a measure of skeletal maturity. BA is also use for detecting abnormalities in development, monitoring growth hormone, estimating height, detecting endocrinal disorders [16]. Hand radiographs are mostly used for assessing BA as the hand radiographs are easy to perform and have multiple bones in a single radiograph and exposure dose is also less as compared to other body parts. Two highly used method for age estimations are Greulich- Pyle (GP) method and Tanner-Whitehouse (TW) method which are based on radiographs of hand of children [17]. GP method is based on matching to reference images from the standard atlas. Whereas TW method is based on scoring method. GP method is older than TW method and easy for use. GP method was developed in 1917-1942 using x- ray images of Caucasian children. Whereas TW method developed in 1950 using British children and further revised in 2001 using radiographs from different ethnic populations and eliminating 20 bone score to 13 bone score. In adults age determination can be done by studying developing dentition, degenerative changes of skeleton, fusion of epiphyses, deposition, remodelling and bone resorptions. But all these methods are influenced by subjectivity of the examiner to overcome this biasness more accurate and reliable methods are needed. Automated image analyses methods were developed by different developers in past.
D. Giordano in 2015 examined 360 radiographs of hand [18]. For age range 0-6 years with equal number of male and female radiographs. Hidden Markov Model (HMM) with a modified TW-2 method was used. The mean error was reported 0.41±0.33 years with high rate of accuracy. The main drawback of the study was limited age range where it can be applied.
A study conducted by Darko Stern and Martin Urschler in 2016 took the 3D images of hand from a database of 132 MRI images and generate a regression algorithm using fusion development information from different bones of hand and giving results in line with radiologists with 0.82±0.56 years of absolute deviation from chronological age [19]. They choose Random Regression Forests (RRF) algorithms. RRF is a non-linear approach that provide better overview features during trees construction. The method used different fusion strategies and assumed hand bones to be localized, aligned and cropped.
In 2018 Darko Stern et al. conducted another study trained on 328 MRI images using Random Forests and Deep Convolutional Methods (ML) using 3D Hand MRI images and obtaining mean absolute error of 0.37±0.51 years for the age range less than or equal to 18 years [20]. All the MRI examinations were done on 3.0 T scanner with age range 13-25 years. The selected 141 subjects were equal or under 18 years old. This multifactorial method was further tested to 2D images which were publicly available datasets of radiographs, and the results were like other automatic methods that are developed for x-ray images. The positive attribute of this study was use of non-ionizing radiation.
One more article by Darko Stern et al. in 2019 based on deep convolutional neural network was based on three input anatomical areas (hand, clavicle bone and tooth) [21]. The age range was between 13-25 years using MRI multi factorial nonionizing technique as taken in above study. They provide the idea of fusing three different anatomical sites for age estimation. A total of 322 Caucasian male subjects’ data was collected, 134 subjects were below 18 years. As per the authors this study was the first approach for automatic evaluation of information of different anatomical areas on a wide range of age for age estimation. The prediction error of 1.01±0.74 years was noted. The results also showed that regression-based methods were more suitable for the study. But special attention is needed because of biological variations within the same age range and resulting in wrong prediction of minor as adult and adult as minor.
J.R.Kim et al. in 2017 used a program based on GP and deep learning for BA evaluation for age range 3-17 years [22]. A total 200 radiographs of left hand were studied using three combination first was with software alone, second was with software and two radiologist third combination was of radiologists and GP atlas. The result of the study confirmed the increase in accuracy with the software. The increase in efficiency and decrease in reading time was observed. The concordance range of software was 57.5% and 72.5% for radiologists and 69.5% software alone whereas with GP it was 49.5% and 63% only. This automatic system gave three estimated bone age values in order of percentage based on probability. Certain limitations were observed in this study first was small sample size, second estimated bone age differ from the actual chronological age in some cases. Third children below 2 years were not studied as this software was based on GP atlas method.
J. H. Lee et al. used a deep convolutional neural network learning technique in 2018 where a regression model was trained with help of training set of hand radiographs [23]. A deep learning tool named Caffe was demonstrated. The mean absolute difference was 18.9 months and concordance correlation coefficient were 0.78. The data for training was small due to which training and test performance value differs.
T.D. Bui in 2019 gave a model based on TW-3 and deep convolutional networks to enhance the bone age accuracy [24]. The method based on 6 regions of interests (ROI) from TW-3. The methodology was based on two phases first was ROI detection and second was ROI classification using regression networks. The method showed that TW-3 based deep CNN overcomes the limitations of GP based deep CNN. For better future study implication of the whole image analysis should be included.
A study conducted in 2021 by C V Pham et al. estimated the age of adults, they investigated 814 adults within the age range of 20-70 years. It was a fully automated approach using PMCT scans as input and predicting age as output by studying two bones mandible and femur with using a model based on 3D CNN [25]. The result showed that femur alone was better as age indicator than mandible alone. On combining mandible and femur the accuracy further improved and mean absolute error was decreased. Most of the models discussed above were for children but this model estimates the age in adults and modality of choice was also different from above studies. Further validation of this model on different ethnicity is required and segmentation of other age indicator can be segmented together to get more precise and accurate BA.
David B Larson et al. concluded that DL CNN model can estimate age with accuracy similar to the radiologist with very less reading time [26]. The model used hand x-ray images as input and the performance was measured in terms of root mean square (RMS) and mean absolute difference which was 0.63 and 0.50 years respectively and RMS for Digital Hand Atlas data set was 0.73 years.
Another study published in 2017 by Hyunkwang Lee also used CNN to develop fully automated program to segment ROI and pre-process input radiographs and perform BA [27]. Hands radiograph of age group 5-18 years were selected. Interpretation time was less than or equal to 2 seconds and gave accuracy of 61.4% in females and 57.32% in males. Although results are promising for future but accuracy rate in this study was less as compared to above studies. An article published in 2018 was also CNN based by S.H. Tajmir age range 5-18 years was selected for the study. Accuracy rate was 68.2% root mean square error was 0.548 years and 98.6% within 1 year [28]. The combined accuracy rate of AI and radiologist was high as compared to AI alone, radiologist alone.
A fully automatic 2-D knee segmentation based on MRI 3-D images and CNN was developed by Paul-Louis Prove et al. in 2019. Multiple pre-processing steps were involved for correcting image intensity to reduce size of image [29]. Initial results gave MAE (mean absolute error) of 0.48±0.32 years for test set of 14 subjects. The proposed method will provide reproducible and faster BA in future.
In 2020 an article published by author Markus et al. used 3-D MRI knee scans and CNN tree-based ML program for age estimation in the age group 13-21 years old in Caucasian population, 404 sagittal and 185 coronal MRI images were studied [30]. The methodology was based on three main steps- Image pre-processing, bone segmentation and final Age estimation. MAE was 0.67 ± 0.49 years with 90.9% accuracy was observed. 88.6% was the sensitivity and 94.2% was the specificity. The results of the study are promising for future implications, but a large training data set is required.
Another study conducted by Fuk Hay Tang et al. based on MRI scans was published in 2018. ANN and TW3 program were used on 79 subjects having age range 12-17 years on Chinese population [31]. MRI of left hand and wrist were used as input material the sections were in coronal view. Independent indicators like height and weight of subjects, intensity and composition of bone marrow quantified by MRI were also used as input. This ML approach was 10-fold more accurate than TW3 method alone.
A study demonstrated the accuracy and efficiency of BoneXpert 2.1 in BA irrespective of gender. Less reading time was also its quality as compared to GP method. 514 patients in the age range 3-17 years were selected among the German population. Left hand and wrist radiograph were studied using AI by Christian Booz et al. the correlation between AI and reference was significantly higher (r=0.99) as compare to other method (r=0.90) and mean reading time was also reduced to 87% [32].
Spampinato studied age range 0-18 years age group for automatic BA. A CNN network BoNet was used to assess BA on publicly available dataset comprising different races, gender and age range [33]. DL techniques were applied, and results showed an average difference of 0.8 years between manual and automatic evaluation. The authors tested existing CNN like Google Net, Oxford Net and Over Feat which were already pretrained on a dataset of 1400 radiographs. The authors stated that many convolutional layers do not mean high performance always. In their study only 5 layers gave the desired performance.
4.2 Sex determination
Skeletal bones play a key role in gender estimation. Gender identification is very crucial whenever skeletal remains are found for medicolegal and courtroom purpose. The skeleton bones play the major role in sex determination and with help of radiology modalities the sex identification become easier. Skull and pelvis play a vital role in gender identification, the shape and size of the bones are different in male and females that is the common criteria to differentiate male and female skeleton [34]. Sex prediction is a crucial part for a forensic expert in making biological profile. Different biomarkers are studied by the anthropologist in estimating sex [35]. In literature almost all the bones were studied, and accuracy was compared among different populations. Femur, patella, wrist, clavicle, sternum, mandible, metatarsals are used in sex predictions [15, 36] . Costal cartilage calcification was also studied in past for estimating sex [37, 38]. With advancement in AI technology the rate of accuracy and specificity in predicting sex is increased and that is shown by number of studies discussed here.
Mumtaz A. Kaloi in 2018 proposed a network based on CNN using radiographs of left hand of children in age range 1 month to 18 years old for gender determination [39]. The accuracy rate was 98%, Class Activation Mapping determined that carpals, ulna, and radius were more specific for gender determination as compared to upper hand bones.
Wen Yang et al. in 2019 proposed a BPNN (backpropagation neural network) which was an improved version by using skull [40]. A total of Chinese 267 skulls from whole skull database of CT scan were studied out of which 153 were of females with age range 18-88 years. Six parameters were used as input to get the desired result, for improving generalization ability Adaboost algorithm were used. The accuracy rate was 96.76% with 0.01 mean square error.
Yongjie Cao et al. developed CNN models based on pelvic regions and finally compared these models with two forensic anthropologists using morphological methods [41]. Dataset of 862 subjects were divided for training, validation and testing. CNN model was developed for ventral pubis, dorsal pubis, greater sciatic notch, pelvic inlet, ischium and acetabulum. Except last two prediction metrics were over 0.9. rate of accuracy was higher than the two experts. The accuracy rate was 98.2% for Pelvic Inlet.
A deep learning ANN was trained on 900 virtual skulls constructed from CT images [42]. The proposed model accuracy rate was 95% the author James Bewes et al. stated that once this program is trained it is rapid to use and abolish the human biasness. 500 subjects of each sex were selected in the age range of 18-60 years. The CT images were collected from the PACS database. The study population mainly consisted of European population and three non-European population was also observed (Chinese, Indian, Vietnamese). Distinct ethnicity was a positive aspect as compared to other previous studies. GoogleNet a CNN network was adapted for the study and transfer learning was performed with help of MATLAB. The final three layers of neural network was modified to classify image as ‘male or female’ at the input level.
A unique study based on grey matter of brain was conducted by Nathaniel E. Anderson et al. was published in 2018. MRI scans were used of 1300 subjects in the age range of 12-66 years for training and testing the model based on ML [43]. The accuracy rate was greater than 93%. Orbitofrontal and frontopolar regions were seen to be larger in females and anterior medial temporal portion larger in males. They also replicate the same results of accuracy rate in a healthy control sample. MRI based methods are being developed but require more research with different skeleton structures.
A retrospective study based on Iranian population was conducted in the age group of 18-70 years (Maryam Farhadian et al.). Cone beam CT images of 190 subjects were collected of mastoid region [44]. The measurements of mastoid process were conducted by using different data mining algorithms. Total 9 landmarks were measured. Different models were compared using cross validation. But the random forest model was the best giving 97% accuracy rate and ANN model was the least giving 84% accuracy, for rest 5 models all showed accuracy greater than 90%. The inter mastoid distance and the distance between the most prominent convex mastoid point played the largest role in sex estimation. The author concluded that finding best method for different dataset is challenging and require widespread investigation and further studies on different populations are required.
A research paper was published in year 2015 by M.F. Darmawan et al. This paper analysed the bones of left hand of Asian children for sex estimation with three different models- Discriminant Function Analysis (DFA), Support Vector Machine (SVM) and ANN [45]. 333 x-ray images were taken of children below 19 years. Nineteen hand bones were studied using Free Image Software and MATLAB. The age group 16-19 (96.6% accuracy) and 7-9 (80.49%accuracy) were considered best for sex estimation as the average accuracy rate for these groups was more than 80%. The ANN model was superior to other two models in terms of accuracy rate in the (above mentioned age group) findings of this study. The results of the study showed that each model has best accuracy rate on each different age group.
Similarly, another ANN-based study where sex determine from the shape of calvarium was conducted with help of 1700 lateral CT scans with age range 25-92 years [46]. The sample was consisted of Caucasian male and female (850 each). The result of the study yields an accuracy rate higher than 80%. The author (Fabio Cavalli et al.) suggested the use of such model on other bones to improve the susceptibility of the proposed methodology.
ML based study using different algorithms like ANN, SVM and logistic regression (LR) was conducted to generate models for sex estimation based on measurement of 47 landmarks on cranium [47]. 393 Bulgarian adult’s 3D CT images were selected. The accuracy of models was evaluated by 10×10-fold cross validation and the accuracy rate was greater than 95%. The best accuracy result was of SVM the authors Diana Toneva et al. suggested the use of such model on other skeleton bones of the body for forensic scope.
Sehyo Yune et al. in 2018 developed a CNN based model for predicting sex on the basis of radiographs of hand and wrist in a sample size of 1531 with age range 5-70 years [48] . The results were than compare with the radiologists, but the accuracy rate of the model was quite high as compared to the radiologists. Hence showing that DL can be used to identify patterns where human perception fails. The accuracy rate was 95.9% for the CNN model and 58% for the two radiologists. Heat maps were generated with help of class activation maps (CAM) and showed that model focused mainly on the 2nd and 3rd metacarpal base or thumb sesamoid in women, and distal radioulnar joint, distal radial physis and epiphysis, or 3rd metacarpophalangeal joint in men. The model predicted 77.8% of accuracy when done on the sample which was sexually disorder/transgender.
We observed four major limitations in the above discussed studies: 1. Selected gender and age group were evaluated in above studies that was also with in very narrow range. 2. Training of medical experts is needed for visualizing and interpreting results of different deep learning techniques. 3. A realistic applications of these models in forensic setup is required for its futuristic implication in court room trials. 4. The output of the models depend upon the input image scans and if the quality of scans is not good enough it might not be accepted by the model.