Diagnostic accuracy of insulin-like growth factor-1 for screening growth hormone deciency: a prospective, single-center, diagnostic accuracy study

Purpose We evaluated the diagnostic accuracy of insulin-like growth factor-1 (IGF-1) for screening growth hormone deciency (GHD) to determine the usefulness of IGF-1 as a screening test. Methods Among 298 consecutive children who had short stature or decreased height velocity, we measured IGF-1 levels and performed growth hormone (GH) secretion test using clonidine, arginine, and, in cases with different results of the two tests, L-dopa. Patients with congenital abnormalities were excluded. GHD was dened as peak GH ≤ 6.0 ng/mL in the two tests. Results We identied 60 and 238 patients with and without GHD, respectively. The mean IGF-1 (SD) was not signicantly different between the GHD and non-GHD groups (p = 0.23). Receiver operating characteristic curve analysis demonstrated the best diagnostic accuracy at an IGF-1 cutoff of −1.493 SD, with 0.685 sensitivity, 0.417 specicity, 0.25 positive and 0.823 negative predictive values, and 0.517 area under the curve. Correlation analysis revealed that none of the items of patients’ characteristics increased the diagnostic power of IGF-1.


Introduction
Growth hormone (GH) secretion needs to be assessed for the diagnosis of GH de ciency (GHD) by stimulation tests. However, there are several challenges associated with the GH secretion test [1].
Pharmacological stimuli are not physiological, and their accuracy is poor. It is well known that normally growing children may have falsely low GH responses. Moreover, the diagnostic criteria for GHD are not uniform worldwide [2]. Furthermore, the GH secretion test may be in uenced by factors such obesity, undernutrition, sex, age, puberty, and presence of chronic diseases. It also has potential adverse reactions and may sometimes result in hospitalization. Therefore, a predictive biomarker for GHD is desired to avoid unnecessary GH secretion test.
Insulin-like growth factor-1 (IGF-1) is a small polypeptide hormone secreted by the liver when stimulated by GH. As serum levels of IGF-1 show little circadian variation, IGF-1 has been considered as a predictive biomarker for GHD [2]. The utility of IGF-1 for the screening of GHD was reported in some studies [3][4][5][6][7] but not in others [1,8]. As the study settings in these reports were different, it is di cult to compare the diagnostic accuracy of IGF-1. For example, the inclusion criteria for GH secretion test comprise not only short stature but also bone age [2], target height [3,6], and catch-up growth [3]. Furthermore, the studies used different GH cutoff levels [1, 3-6, 8, 9]. Therefore, a prospective cohort study was required to determine the diagnostic accuracy of IGF-1. We prospectively analyzed a cohort of children with short stature to evaluate the diagnostic accuracy of IGF-1 for the diagnosis of GHD.

Patients
This was a prospective cross-sectional study on children with short stature or decreased growth velocity who were examined at Aichi Medical University Hospital between April 2015 and March 2020. All study evaluations and procedures were performed in accordance with the Declaration of Helsinki, and Ethical Guidelines for Medical and Health Research Involving Human Subjects established by Japanese Government. We used the following inclusion criteria: (a) referred to Aichi Medical University for the evaluation of short stature or decreased growth velocity; (b) short stature of ≤ − 2 SD or height velocity of ≤ − 1.5 SD in > 2 years below the mean for sex and age [10]; and (c) > 1 year of age and before the completion of puberty, according to Tanner stages. The exclusion criteria were the presence of recognized congenital abnormalities, such as hypothyroidism; small for gestational age; Turner's syndrome; and trisomy 21.
General biochemical tests, thyroid function test, bone age, and IGF-1 were examined before GH secretion test in consecutive patients who met the inclusion criteria. The radius, ulna, and short bone method was used for evaluating bone age [11]. The patients were divided into GH and non-GH groups according to the response to the GH secretion test (GH, 60; non-GH, 238). In Japan, GHD is diagnosed when the peak GH is ≤ 6.0 ng/mL in two GH secretion tests [9]. A cutoff of 6 ng/mL was determined by the Japanese National Health Insurance program. Stimulation tests using clonidine, arginine, and L-dopa were performed in that order, using the algorithm shown in Fig. 1. GHD was diagnosed if the GH peak levels were ≤ 6 ng/mL in the two stimulation tests. If the GH peak was above the cutoff level in the clonidine stimulation test, the next stimulation test was not performed. If the GH peak of the arginine stimulation test was 6-8 ng/mL, the third L-dopa stimulation test was performed. If the GH peak of the arginine stimulation test was > 8 ng/mL, the third test was not performed as GHD was unlikely to be present. Glucagon was not used in this study because glucagon requires a long examination time of 180 min. Insulin was also not used in this study because of its potentially serious side effects and we were not accustomed to its use.
After overnight fasting, the stimulation test was started at 6:30 for children < 6 years old and at 9:00 for those > 6 years old because of fasting tolerance. Sampling was done at 0, 30, 60, 90, and 120 minutes.
Clonidine (5 µg/kg), arginine (10 mg/kg), and L-dopa (10 mg/kg) were administered as the stimuli for the GH secretion test. Sex steroids were not used for priming before the GH secretion test. After the diagnosis of GHD, head MRI was performed before starting GH replacement therapy.

Hormone assays
Serum IGF-1 was measured by electrochemiluminescence immunoassay (Elecsys IGF-1; Roche Diagnostics, Tokyo, Japan), which was calibrated against the WHO International Standard 02/254. The values of serum IGF-1 were transformed into SDs, according to the established reference ranges of the assay for sex and calendar age [12]. GH was measured by immunoenzymometric assay (E Test TOSOH II HGH; Tosoh Co., Ltd., Tokyo, Japan), which was standardized against the WHO International Standard 98/574. According to the manufacture's datasheet, the intra-and interassay coe cients of variation (CV) for IGF-1 was < 10% and < 20%, and those for GH was < 10% and < 15%. As GH was measured in the hospital, we tested for intra-assay CV for GH in our hospital and found that it was 2% on average.
Interassay CV for GH in our hospital was not tested. IGF-1 was measured by the testing company.

Statistical analysis
We calculated point estimates for IGF-1 (SD) sensitivity, speci city, positive predictive value (PPV), negative predictive value (NPV), diagnostic e ciency (DE), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) for predicting the presence of GHD. Data were shown as median (interquartile range) for chronological and bone age, and as mean ± SD for the other numerical variables. Based on the f-test, Student's t-test was performed in the case of homoscedasticity and the Mann-Whitney U test was performed in the case of unequal variances to compare the IGF-1 level and other variables between the two groups. Spearman's rank correlation coe cient test was performed to investigate the relationship of IGF-1 (SD) with age, bone age, height (SD), target height (SD), height velocity before examination (SD), weight (SD), body mass index (BMI) (SD), and maximum peak GH (ng/mL). Correlation was de ned as very weak if < 0.2, weak if ≥ 0.2 and < 0.4, moderate if ≥ 0.4 and < 0.6, strong if ≥ 0.6 and < 0.8, and very strong if ≥ 0.8. Receiver operating characteristic (ROC) analysis with the Youden index was used to compare the discriminatory performances of IGF-1 in the diagnosis of GHD. Based on the area under the ROC curve (AUC), performance was considered as acceptable if > 0.7 and ≤ 0.8 and excellent if > 0.8.
All statistical analyses were performed using EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan) [13], which is a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria). More precisely, it is a modi ed version of R commander designed to add statistical functions frequently used in biostatistics.

Results
The patients included in this study had a median age of 4.98 years (interquartile range, 3.21-9.38 years).
We identi ed 60 children with GHD and 238 children without GHD (non-GHD) (Fig. 1), with male preponderance of 53.4%. Four patients were diagnosed with organic GHD [lymphocytic hypophysitis (n = 2), craniopharyngioma (n = 1), and cerebral myeloma (n = 1)]. The backgrounds of these patients are described in Table 1. Height velocity after examination, body weight (SD), and BMI (SD) were signi cantly higher in the GHD group than in the non-GHD group. Maximum peak GH was signi cantly lower in the GHD group than in the non-GHD group.
To clarify the relationship between pretreatment IGF-1 and response to GH, we compared the height velocity (SD) between groups with IGF-1 above (n = 34) and below (n = 26) the cutoff value (− 1.493 SD).
To assess the e cacy of the third stimulation test, patients diagnosed with GHD on the second (n = 39) and third tests (n = 21) were compared ( Table 2). Pre-and posttreatment growth velocities (SD) were similar between these groups (pretreatment, − 1.66 ± 2.36 vs. −0.89 ± 2.31, p = 0.232; posttreatment, 2.44 ± 3.00 vs. 2.87 ± 3.33, p = 0.618). To be more precise, the height velocity was compared according to age and sex groups ( Table 2). The age was classi ed into two categories as those aged ≤ 9 years and those aged > 9 years in boys as well as those aged ≤ 8 years and those aged > 8 years in girls. In any subgroup, height velocity before and after the examination was not signi cantly different between those diagnosed on two and three tests ( Table 2).

Discussion
We found that IGF-1 had poor accuracy as demonstrated by low AUC, and poor sensitivity, speci city, and DE for the best cutoff of − 1.493 SD. The correlation analysis revealed that none of the items increased the diagnostic power of IGF-1 for GHD screening.
IGF-1 has been reported to be useful in the screening of GHD in some studies [3][4][5][6][7] but not in others [1,8]. The reason for these contradictory results is that the patient groups and GHD cutoff values differed between these studies. First, the inclusion criteria may create differences in patient backgrounds. In previous studies on the usefulness of IGF-1 for GHD screening, patients were selected according to bone age [2], target height [3,6], or catch-up growth [3] in addition to short stature and/or height velocity. These variations in inclusion criteria might super cially improve the sensitivity and speci city of IGF-1. Second, different GH cutoff levels for GHD were selected: ≤5 ng/mL [5,6], ≤ 6 ng/mL [9], ≤ 7 ng/mL [1], ≤ 8 ng/mL [3,8], and ≤ 10 ng/mL [4]. In the case of ≤ 8 or ≤ 10 ng/mL [3,8], the prevalence of GHD in patients with short stature was > 30%, which was higher than that in our study (20.1%). Since disease prevalence affects sensitivity, speci city, PPV, and NPV, IGF-1 is not a useful screening test in a patient population with low prevalence of GHD. In the cohort of this study, the prevalence of GHD was decreased to 12.8% when the GH cutoff level of 5 ng/mL was selected. Therefore, when evaluating the e cacy of IGF-1, comparisons should be made at the same GH cutoff levels.
Bone age, target height, and height velocity should be taken into consideration before selecting patients for the GH secretion test [2]. In our study, bone age, target height, and height velocity before the examination were similar between the GHD and non-GHD groups. Even after combining these conditions with IGF-1, the diagnostic power of IGF-1 for GHD screening did not increase. Therefore, it would be di cult to distinguish patients with GHD from those without GHD using those parameters.
To clarify the relationship between pretreatment IGF-1 and response to GH, height velocity (SD) between groups with IGF-1 above (n = 34) and below (n = 26) the cutoff value (− 1.493 SD) was compared. Preand posttreatment height velocity (SD) were similar between the groups. IGF-1 was reported to be weakly correlated with the clinical endpoints of GH treatment [14]. Therefore, it would be di cult to predict the degree of improvement prior to GH treatment using pretreatment IGF-1.
We performed the third stimulation test when the results of GH secretion in the rst and second tests were different. Although a su cient GH response in one stimulation test rules out GHD in most cases [2], the utilization and interpretation of the drugs used in the stimulation test depends on the facility [1,15]. In this study, pre-and posttreatment growth velocities were similar between the patients diagnosed with GHD on the second and third tests ( Table 2). In any subgroup, height velocity before and after examination was not signi cantly different between those diagnosed on two and three tests. This result indicated that patients diagnosed with GHD by the third test have the similar response to growth hormone as those diagnosed by the traditional method. Therefore, the third simulation test may have some signi cance in diagnosing patients with GHD.
This study had several limitations. First, immunoassay for IGF-1 analysis is not the most sensitive assay. The variations in immunoassays used in different studies may result in variations in the reported e cacy of IGF-1. More accurate assays, such as LC-MS, may reveal the actual usefulness of IGF-1 for GHD screening. Second, the use of a third stimulation test is not a common practice. If one of the tests is normal, there is no need for a third one. Thus, if the cutoff for a normal GH peak is set at 6 ng/ml, all responses above 6 should be considered normal. However, depending on the order of each stimulation test, the diagnosis of GHD may vary among patients. For example, a patient with a peak GH < 6 ng/mL in A and B stimulation tests and ≥ 6 ng/mL in C stimulation test would not be diagnosed with GHD if the order of the stimulation tests were A, C, and B. There is no evidence on the order of stimulation tests, and the order varies from institution to institution. In the present study, the response to GH was similar in patients who had substandard results in two of the two stimulation tests and in those who had substandard results in two of the three stimulation tests. Therefore, it is necessary to accumulate such cases to clarify the signi cance of the third stimulation test.
In conclusion, IGF-1 level had poor diagnostic accuracy as a screening test for GHD. Correlation analysis revealed that none of the items increased the diagnostic power of IGF-1. Therefore, IGF-1 should not be used alone for the screening of GHD. A predictive biomarker for GHD should be developed in the future. Table 2. The analysis about height velocity (HV) before and after examination according to age groups.  Figure 1 Algorithm of the stimulation tests using clonidine, arginine, and L-dopa If peak growth hormone (GH) in both clonidine and arginine tests was ≤6.0 ng/mL, GH replacement therapy was initiated. When the peak GH in arginine test was 6.0-8.0 ng/mL, the third stimulation test using L-dopa was performed.