For the data collected in this study, basal AMH, FSH, LH (Luteinizing hormone), E2 (Estradiol), P (Progesterone), T (Testosterone), PRL (Prolactin) and vitamin D were measured in all women between day 2 and day 4 of menstrual cycle. The same day of hormonal assay, a single experienced investigator performed all the ultrasound scans. The height and weight were also recorded. All indicators are measured by qualified professional technicians in the laboratory, and the operation is carried out in strict accordance with the reagent kit and instrument instructions.
2. Population data collection
2.1 Phase I data collection
From May 1 to June 30, 2020, the data of subjects (including healthy, PCOS (Polycystic ovary syndrome) and DOR (Diminished ovarian reserve) populations) were collected in the outpatient department of gynecology and reproductive medicine of Jiangsu Province Hospital of Chinese Medicine to compare their chronological age with their predicted ovarian age. The experimental design fully considered the principles of safety and fairness, and the research did not pose any harm or risk to subjects. The recruitment was based entirely on the principles of voluntary and informed consent, the privacy was protected to the greatest extent possible. There is also no conflict of interest.
2.2 Phase Ⅱ data collection
Between July 1 and August 31, 2020, paid recruitments of subject were conducted through a contract research organization (CRO) company to collect physical examination data from healthy women. All tests were performed in the Laboratory of Jiangsu Province Hospital of Chinese Medicine.
2.3 Phase Ⅲ data collection
The age range of those who participated in this study through the three recruitment methods described above was 15 to 55 years.
Inclusion criteria for the selection of the training subjects for this model were: with history of spontaneous conception(s), intact ovaries and regular menses with a mean interval of 21 to 35 days.
Exclusion criteria were: estrogen or progestin use or in breastfeeding the two months before enrollment, pregnancy, history of female infertility, endometriosis, presence of ovarian follicles measuring more than 10 mm at study entry ultrasonography and other cystic masses of the ovary, history of ovarian surgery, PCOS, gynecological malignancy, previous radiation or chemotherapy, autoimmune disease, known chronic, systemic, metabolic and endocrine disease including hyperandrogenism, hyperprolactinemia, diabetes mellitus and thyroid diseases, hypogonadotropic hypogonadism or with history of use of drugs that can cause menstrual irregularity.
PCOS (polycystic ovary syndrome) was diagnosed according to the Rotterdam criteria, when at least 2 of the following 3 features existed: oligo/ amenorrhea, clinical and/or biochemical hyperandrogenism, and polycystic ovaries morphology (PCOM) .
The diagnosis of DOR has not been standardized, and there are no international criteria. In combination with the current guidelines of the national ART surveillance system (NASS) and society for assisted reproductive technology (SART), the diagnostic criteria used in this study were age ≥40 years, basal FSH ≥12 mIU/mL, AFC≤5~7; AMH≤1.1 ng/mL, any 2 or more of the 4 items can be determined as DOR .
The recruitment process for outpatient department of gynecology and reproductive medicine, the CRO company, and OvAge WeChat mini program were all based on inclusion and exclusion criteria, and the past history was collected through a professional questionnaire.
4. Statistical analysis, model construction and optimization
A dataset of 149 records of subjects has been analyzed using R language (Version 4.0.2). The whole analysis consists of seven main steps (Figure 1).
(i) The main dataset has been divided into three sub-datasets, according to the preliminary clinical analysis HCs (Healthy Controls), DOR and PCOS. Each subset has been checked to ensure the data quality, in terms of the presence of missing values and consistency in decimal separator. For missing data, multiple imputations are applied.
(ii) For each dataset, descriptive statistics for each variable have been calculated. The rank correlations between the chronological age and the variables have been analyzed by applying Spearman’s test and multiple testing with adjusted p-values by Holm’s test.
(iii) For describing the relationship between the new response variable here introduced and called OvAge, and the set of inputs, which are the independent variables, we applied the generalized linear model (GLM) theory, which provides a unified methodology for modeling all types of response variables, such as continuous, binary, ordinal response or variables in the form of proportions. Since it has been hypothesized that, in healthy population, OvAge is equal to chronological age, which is not continuous and non-normal, a Poisson distribution has been chosen as the random component of the GLM for modeling the expected value of OvAge. Identity and logarithm have been chosen as candidate link functions.
(iv) For the Ockham's razor (law of parsimony) and avoiding collinearity problems due to the possible non independence of predictor variables, variable selection methods have been applied. The stepwise selection, least absolute shrinkage and selection operator (LASSO) and sure independence screening (SIS) are considered. The boosting method in machine learning is also considered to construct models, which is an ensemble meta-algorithm in supervised learning and a family of algorithms that convert weak learners to strong ones. The overall best model was chosen using the Akaike information criterion (AIC). For the sake of completeness, for each generated model, 10-fold cross-validated accuracy and leave-one-out accuracy have been calculated too. In particular, in the 10-fold cross-validation, the dataset is divided into 10 groups of approximately equal size and for each group the generalized linear model is fit to nine of the groups (training set) omitting one group that is then used as the test set. In the leave-one-out only one observation is left out as the test set. As further evidence of the significance of the linear terms, ANOVA test (Chi-squared) had been performed for comparing each model to the null model in terms of deviance.
(v) The best model had been re-built taking into account all possible interactions among covariates to assess any improvements in AIC value and accuracy. Interactions terms in the model provide interactive effects and they are considered when main effects are significant.
(v) For evaluating the quality of fit of the final model, several diagnostic statistical techniques had been applied. Influence measures considered are DFBETAS, DFFITS, covariance ratio CR and Cook’s distance D and the leverages h. The R function influence.measures() is used to identify potentially inﬂuential observations according to R’s criteria with influence measures above. In a manual and iterative way, identified influential single cases had been first checked by specialists and then removed from the dataset. The final model has been repeatedly updated without influential cases and with the significance of any predictor having a p-value <0.01.
(vii) The final generalized linear model has been applied on POI and PCOS dataset. The hypothesis was that, in the case of POI, the ovarian age is greater than the chronological age and in the case of PCOS the ovarian age is lesser than the chronological age.