Diagnostic accuracy of IGF-1 for screening growth hormone deciency: a prospective, single-center, diagnostic accuracy study

Purpose We evaluated the diagnostic accuracy of insulin-like growth factor-1 (IGF-1) for screening growth hormone deciency (GHD) to determine the usefulness of IGF-1 as a screening test. Methods On 298 consecutive children who had short stature or decreased height velocity, we measured IGF-1 levels and performed growth hormone (GH) secretion test using clonidine, arginine, and, in cases with different results of the two tests, L-dopa. Patients with congenital abnormalities were excluded. GHD was dened as peak GH ≤ 6.0 ng/mL in the two tests. Results We identied 60 and 238 patients with and without GHD, respectively. The mean IGF-1 (SD) was not signicantly different between the GHD and non-GHD groups (p = 0.23). Receiver operating characteristic curve analysis demonstrated the best diagnostic accuracy at an IGF-1 cutoff of −1.493 SD, with sensitivity of 0.685, specicity of 0.417, positive predictive value of 0.25, negative predictive value of 0.823, and area under the curve of 0.517. Spearman’s rank correlation coecient showed that IGF-1 (SD) was weakly correlated with age, bone age, height velocity before examination, weight (SD), and BMI (SD) and very weakly correlated with height (SD), target height (SD), and maximum GH peak. Correlation analysis revealed none


Introduction
Growth hormone (GH) secretion needs to be assessed for the diagnosis of GH de ciency (GHD) by stimulation tests. However, there are several challenges associated with the GH secretion test [1]. Pharmacological stimuli are not physiological, and their accuracy is poor. It is well known that normally growing children may have falsely low GH responses. Moreover, the diagnostic criteria for GHD are not uniform worldwide [2]. Furthermore, the GH secretion test may be in uenced by factors such obesity, undernutrition, sex, age, puberty, and presence of chronic diseases. It also has potential adverse reactions and may sometimes result in hospitalization. Therefore, a predictive biomarker for GHD is desired to avoid unnecessary GH secretion test.
Insulin-like growth factor-1 (IGF-1) is a small polypeptide hormone secreted by the liver when stimulated by GH. As serum levels of IGF-1 show little circadian variation, IGF-1 has been considered as a predictive biomarker for GHD [2]. The utility of IGF-1 for the screening of GHD was reported in some studies [3][4][5][6][7] but not in others [1,8]. As the study settings in these reports were different, it is di cult to compare the diagnostic accuracy of IGF-1. For example, the inclusion criteria for GH secretion test comprise not only short stature but also bone age [2], target height [3,6], and catch-up growth [3]. Furthermore, the studies used different GH cutoff levels [1, 3-6, 8, 9]. Therefore, a prospective cohort study was required to determine the diagnostic accuracy of IGF-1. We prospectively analyzed a cohort of children with short stature to evaluate the diagnostic accuracy of IGF-1 for the diagnosis of GHD.

Patients
This was a prospective cross-sectional study on children with short stature or decreased growth velocity who were examined at Aichi Medical University Hospital between April 2015 and March 2020. We used the following inclusion criteria: (a) referred to Aichi Medical University for the evaluation of short stature or decreased growth velocity; (b) short stature of ≤ −2 SD or height velocity of ≤ −1.5 SD in >2 years below the mean for sex and age [10]; and (c) >1 year of age and before the completion of puberty, according to Tanner stages. The exclusion criteria were the presence of recognized congenital abnormalities, such as hypothyroidism; small for gestational age; Turner's syndrome; and trisomy 21.
General biochemical tests, thyroid function test, bone age, and IGF-1 were examined before GH secretion test in consecutive patients who met the inclusion criteria. The radius, ulna, and short bone method was used for evaluating bone age [11]. The patients were divided into GH and non-GH groups according to the response to the GH secretion test. In Japan, GHD is diagnosed when the peak GH is ≤ 6.0 ng/mL in two GH secretion tests [9]. A cutoff of 6 ng/mL was determined by the Japanese National Health Insurance program. Stimulation tests using clonidine, arginine, and L-dopa were performed in that order, using the algorithm shown in Figure 1. GHD was diagnosed if the GH peak levels were < 6 ng/mL in the two stimulation tests. If the GH peak was above the cutoff level in the clonidine stimulation test, the next stimulation test was not performed. If the GH peak of the arginine stimulation test was 6-8 ng/mL, the third L-dopa stimulation test was performed. If the GH peak of the arginine stimulation test was > 8 ng/mL, the third test was not performed as GHD was unlikely to be present. Glucagon and insulin were not used in this study because glucagon requires a long examination time of 180 min and insulin results in an adverse effect of severe hypoglycemia.
After overnight fasting, the stimulation test was started at 6:30 for children <6 years old and at 9:00 for those >6 years old because of fasting tolerance. Sampling was done at 0, 30, 60, 90, and 120 minutes. Clonidine (5 µg/kg), arginine (10 mg/kg), and L-dopa (10 mg/kg) were administered as the stimuli for the GH secretion test. After the diagnosis of GHD, head MRI was performed before starting GH replacement therapy.

Hormone assays
Serum IGF-1 was measured by electrochemiluminescence immunoassay (Elecsys IGF-1; Roche Diagnostics, Tokyo, Japan), which was calibrated against the WHO International Standard 02/254. The values of serum IGF-1 were transformed into SDs, according to the established reference ranges of the assay for sex and calendar age [12]. GH was measured by immunoenzymometric assay (E Test TOSOH II HGH; Tosoh Co., Ltd., Tokyo, Japan), which was standardized against the WHO International Standard 98/574.

Statistical analysis
We calculated point estimates for IGF-1 (SD) sensitivity, speci city, positive predictive value (PPV), negative predictive value (NPV), diagnostic e ciency (DE), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) for predicting the presence of GHD. Data were shown as median (interquartile range) for chronological and bone age, and as mean ± SD for the other numerical variables. Based on the f-test, Student's t-test was performed in the case of homoscedasticity and the Mann-Whitney U test was performed in the case of unequal variances to compare the IGF-1 level and other variables between the two groups. Spearman's rank correlation coe cient test was performed to investigate the relationship of IGF-1 (SD) with age, bone age, height (SD), target height (SD), height velocity before examination (SD), weight (SD), body mass index (BMI) (SD), and maximum peak GH (ng/mL). All statistical analyses were performed using EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan) [13], which is a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria). More precisely, it is a modi ed version of R commander designed to add statistical functions frequently used in biostatistics.

Results
The patients included in this study had a median age of 4.98 years (interquartile range, 3.21-9.38 years). We identi ed 60 children with GHD and 238 children without GHD (non-GHD) (Figure 1), with male preponderance of 53.4%. Four patients were diagnosed with organic GHD [lymphocytic hypophysitis (n = 2), craniopharyngioma (n = 1), and cerebral myeloma (n = 1)]. The background of these patients are described in Table 1. Height velocity after examination, body weight (SD), and BMI (SD) were signi cantly higher in the GHD group than in the non-GHD group. Maximum peak GH was signi cantly lower in the GHD group than in the non-GHD group.

Discussion
We found that IGF-1 had poor accuracy as demonstrated by low AUC, and poor sensitivity, speci city, and DE for the best cutoff of −1.493 SD. The correlation analysis revealed that none of the items increased the diagnostic power of IGF-1 for GHD screening.
IGF-1 has been reported to be useful in the screening of GHD in some studies [3][4][5][6][7] but not in others [1,8]. The reason for these contradictory results is that the patient groups and GHD cutoff values differed between these studies. First, the inclusion criteria may create differences in patient backgrounds. In previous studies on the usefulness of IGF-1 for GHD screening, patients were selected according to bone age [2], target height [3,6], or catch-up growth [3] in addition to short stature and/or height velocity. These variations in inclusion criteria might super cially improve the sensitivity and speci city of IGF-1. Second, different GH cutoff levels for GHD were selected: ≤5 ng/mL [5,6], ≤6 ng/mL [9], ≤7 ng/mL [1], ≤8 ng/mL [3,8], and ≤10 ng/mL [4]. In the case of ≤8 or ≤10 ng/mL [3,8], the prevalence of GHD in patients with short stature was >30%, which was higher than that in our study (20.1%). Since disease prevalence affects sensitivity, speci city, PPV, and NPV, IGF-1 is not a useful screening test in a patient population with low prevalence of GHD. In the cohort of this study, the prevalence of GHD was decreased to 12.8% when the GH cutoff level of 5 ng/mL was selected. Therefore, when evaluating the e cacy of IGF-1, comparisons should be made at the same GH cutoff levels.
Bone age, target height, and height velocity should be taken into consideration before selecting patients for the GH secretion test [2]. In our study, bone age, target height, and height velocity before the examination were similar between the GHD and non-GHD groups. Even after combining these conditions with IGF-1, the diagnostic power of IGF-1 for GHD screening did not increase. Therefore, it would be di cult to distinguish patients with GHD from those without GHD using those parameters.
To clarify the relationship between pre-treatment IGF-1 and response to GH, height velocity (SD) between groups with IGF-1 above (n = 34) and below (n = 26) the cutoff value (−1.493 SD) was compared. Preand post-treatment height velocity (SD) were similar between the groups. IGF-1 was reported to be weakly correlated with the clinical endpoints of GH treatment [14]. Therefore, it would be di cult to predict the degree of improvement prior to GH treatment using pre-treatment IGF-1.
We performed the third stimulation test when the result of GH secretion in the rst and second tests were different. Although a su cient GH response in one stimulation test rules out GHD in most cases [2], the utilization and interpretation of the drugs used in the stimulation test depends on the facility [1,15]. In this study, pre-and post-treatment growth velocities were similar between the patients diagnosed with GHD on the second and third tests. This indicates that, regardless of the number of stimulation tests, patients with GHD have similar response to growth hormone. Therefore, the third simulation test may be effective in diagnosing patients with GHD.
This study had several limitations. First, immunoassay for IGF-1 analysis is not the most sensitive assay. The variations in immunoassays used in different studies may result in variations in the reported e cacy of IGF-1. More accurate assays, such as LC-MS, may reveal the actual usefulness of IGF-1 for GHD screening. Second, it is unclear whether the third stimulation should be performed or not. In this study, the third stimulation test was performed when the second stimulation test provided a result of 6-8 ng/mL. However, when the rst stimulation test provided a result of 6-8 ng/mL, the next test was not performed. Depending on the order of the tests, the diagnosis of GHD may vary. Therefore, it needs to be clari ed whether the third stimulation test should be performed.
In conclusion, IGF-1 level had poor diagnostic accuracy as a screening test for GHD. Correlation analysis revealed that none of the items increased the diagnostic power of IGF-1. Therefore, IGF-1 should not be used alone for the screening of GHD. A predictive biomarker for GHD should be developed in the future. Figure 1 Algorithm of the stimulation tests using clonidine, arginine, and L-dopa If peak GH (growth hormone) in both clonidine and arginine tests was ≤6.0 ng/mL, GH replacement therapy was initiated. When the peak GH in arginine test was 6.0-8.0 ng/mL, the third stimulation test using L-dopa was performed.