**Seismic Data in the Taupo Volcanic Zone.** We used continuous seismic data recorded by broadband seismometers during August - December 2010 by**47**. Only 25 seismometers for ambient noise data were used, considering the quality of data. The seismometers were distributed over many volcanic geothermal systems (Fig. 1a). The sampling time interval is 100 ms. The distances between the seismometers were spaced between 5 and 45 km (Fig. 1b), and the area of coverage is approximately 45 km × 45 km. We should note that geothermal reservoirs and magmatic heat sources can last more than a couple of thousand years**30,48**; therefore, velocity (derived from four-month data) and temperature models (including exploited areas) may not change significantly over time because of continuous heat sources.

**Analysis of surface wave dispersion.** Our first step involved estimating phase velocity dispersion curves between station pairs from ambient noise data. We have pre-processed the ambient noise data by dividing the vertical component of the continuous ambient noise data into 30-min segments and applied the bandpass filtering at 0.2–5Hz with instrument response correction. By cross-correlating data and linearly stacking all segments for all station pairs, we could observe surface wave propagation on the stacked cross-correlation (CC) functions in the 0.1–1 Hz frequency band (Fig. 2a). Then we applied a whitening and velocity filter to cross-correlation functions 17, and extracted phase velocity dispersion curves for each pair of stations through the zero-crossing method**17,49** (Figs. 2b-c). Finally, we performed surface wave tomography to obtain a 3D S-wave velocity model**19,21**.

The cross-spectrum between two seismometers was calculated using the following equation.

$${\rho }_{z}\left(f\right)= \frac{{u}_{i}\left(f\right){u}_{j}^{*}\left(f\right)}{\left|{u}_{i}\left(f\right)\right|\left|{u}_{j}\left(f\right)\right|};\left(1\right)$$

where \({\rho }_{z}\) is the vertical cross-spectrum, \(f\) is the frequency, \(u\) is the seismic record in the frequency domain from station *i* or *j*, and the asterisk denotes the complex conjugate. Cross-spectrum was used to obtain the phase velocity by the zero-crossing method.

The zero-crossing method**49** estimates phase velocities between station pairs by fitting zeros of the observed cross-spectrum to zeros of the theoretical cross-spectrum in the frequency domain. The phase velocity *C* obtained by the zero-crossing method is expressed as follows.

Here, we described other important points for our velocity analysis. We used only the daily cross-correlation waveforms that resembled others and rejected bad station/data. We determined the zone of phase velocity picking within one wavelength and six wavelength criteria based on the phase velocity and frequency diagram of its station pair (between red lines in Figs. 2b-c). The unique zero-cross dispersion curve (phase velocity or group velocity) is within the zone of the acceptable range of phase velocities (Fig. 2c). Noticeably, Rayleigh group velocities were determined from the stacked cross-correlation functions using continuous wavelet transform analysis from**50**.

Zero-cross dispersion curve analysis is different from surface wave dispersion curved that was analyzed based on time-frequency representation of the data (i.e., multiple filter technique or continuous wavelet transform analysis in which the dispersion curves were picked based on high energy semblance)**51**. We should note that the surface wave arrival time is thus related to the propagation of the wave energy, defined as an energy maximum, which can be difficult to identify due to the presence of other seismic waves through scattering, multipathing, wave type conversions, and other unwanted signals and noises**51**. The velocity filter and the zero-crossing method**17** thus minimized these issues and provided stable dispersion curves in geothermal velocity analysis. A single zero-cross dispersion curve can be picked/selected, which is not changed based on a variation of a high maximum energy semblance velocity map.

**Ambient noise tomography for S-wave inversion.** For this study, we employed a direct 3D S-wave inversion analysis using DSurfTomo**21** based on the LSQR inversion approach**52** in Eq. (3). After selecting all phase velocity dispersion curves (Fig. 2), 3D surface wave tomography was applied to invert these phase velocity pairs into a 3D S-wave velocity model**21**. In this inversion, one-dimensional S-wave velocity models (i.e., averaging from 1.1 times phase velocity pairs; black circles in Fig. 2e) were used as initial velocity models. We used 10 iterations for the grid size (0.015ox0.015o) and depth direction (< 6km) with a smoothing constraint for the speed change and a damping value in the DSurfTomo program based on several trials and errors. The outcome results were not corrected for topographic effects due to minimal elevation difference (Fig. 1b).

The main equation for the S-wave velocity inversion is as follows:

$$\delta {t}_{i}f=\sum _{k=1}^{K}-\frac{{v}_{ik}}{{C}_{k}^{2}}\sum _{j=1}^{J}\left[{R}_{\alpha }{Z}_{j}\frac{\partial {C}_{k}}{\partial {Vp}_{k}{Z}_{j}}+ {R}_{\rho }{Z}_{j}\frac{\partial {C}_{k}}{\partial {\rho }_{k}{Z}_{j}}+ \frac{\partial {C}_{k}}{\partial {Vs}_{k}{Z}_{j}}\right]{|}_{{\theta }_{k}} \delta {Vs}_{k}{Z}_{j}=\sum _{l=1}^{M}{G}_{il}{m}_{jl} ;\left(3\right)$$

where \({t}_{i}f\) is the calculated travel times from a reference model, which can be updated in the inversion, the \({v}_{ik}\) denotes the bilinear interpolation coefficients of the raypaths associated with the *i*th traveltime datum, and \({C}_{k}\) represents the phase velocity at the *k*th 2D surface grid point at frequency *f*, the scaling factors \({R}_{\alpha }{Z}_{j}\) and \({R}_{\rho }{Z}_{j}\) were derived based on the empirical relations 53; \({\theta }_{k}\) represents the 1D reference model and the *k*th surface grid point on the surface; \({Vp}_{k}{Z}_{j}\), \({Vs}_{k}{Z}_{j}\) and \({\rho }_{k}{Z}_{j}\) are the P-wave velocity, S-wave velocity, and mass density at the *j*th node in the depth direction, respectively. The total number of grid points for the 3D model is *M = KJ*. The direct surface wave tomography described in the equation above can be solved for model \(m\) (i.e., S-wave velocity) by further formulated in matrix inverse form of the travel time residual vector for all raypaths at different frequencies, and \(G\) is the data sensitivity matrix. This is solved by the LSQR algorithm of**52**.

**Analysis of machine learning models**. To predict the spatial variation of temperature, we used a 3D S-wave velocity model derived from ambient noise tomography and seven borehole logs**46** in Ngatamiri geothermal reservoir (Fig. 3). Well data and our 1D S-wave velocity were used to train the training model (Fig. 3). We used machine learning model algorithms in MATLAB, which is a statistical technique for building the linear or non-linear regression model between predictor and response variables. There are three main steps, regardless of any machine learning model: training data, test data, and validation data. The predictor variables or independent variables were used as inputs or features to train the regression model, then we optimized the model using the polynomial fit function, and finally we can automatically obtain the best polynomial degree for the model that is not under-fitted or overfitted based on the validation data. It is dependent on which we split the training data and validation set to predict the output by the cross-validation or holdout technique in MATLAB.

Through our machine learning approach, we built various regression models and predicted the results. We automatically performed all trained models and compared all of them (i.e., nonlinear regression models such as Decision/regression Trees, Support Vector Machines, Gaussian Progress Regression models, Kernel Approximation Regression Models, Ensembles of Trees, and Neural Networks) presented in Table 1. Then, we can find the best model that is not under/over fitted**23,54**. Our predictors/input parameters are 7 temperature - depth profiles, 1D velocity of each well location as a training model and performed validation model based on cross-validation fold technique (Fig. 3); as a result, the best-performing model is Gaussian Process Regression (GPR) model. In this study, only data from local wells that we could access were used. If we use more well data and their distribution over a large area, the accuracy can be most optimized.

The most suitable model in our study area is based on the GPR model, which we briefly described below. A non-linear fit function of the parameters can be formulated as follows:

$${y}_{i} = f\left[x,\right]+ϵ ;\left(4\right)$$

where \({y}_{i}\) is the output/response feature, \(f\) is function to evaluate the input/predictor feature \({x}_{i}\), and \(\) is the regression coefficients and \(ϵ\) represents the error terms.

The GPR is the nonparametric kernel-based probabilistic model. Response \(y\) (e.g., temperature) can be modeled from:

$$P\left({y}_{i}|f\left({x}_{i}\right),{x}_{i}\right) \sim N\left({y}_{i}|{h\left({x}_{i}\right)}^{T} + f\left({x}_{i}\right),{\sigma }^{2}\right) ;\left(5\right)$$

where \(N\left(0,{\sigma }^{2}\right) \sim ϵ. {\sigma }^{2}\)is error variance, \(h\)is the explicit basis function. Based on this GPR model, the fit model will be formed by obtaining the basis function coefficients, \(\), the noise variance, \({\sigma }^{2}\), and the hyperparameters of the kernel function, derived from the input data set while training the GPR model. The other set of fit functions relevant to many other models (e.g., Decision Trees, Support Vector Machines, Neural Networks), and regression metrics to evaluate/validate the model performance based on residual \({r}_{i }= {y}_{i}-\widehat{{y}_{i}}\) in which \(\widehat{{y}_{i}}\) refer to the predicted response valued from a regression model as follows:

Mean Absolute Error (MAE) \(=\)\(\frac{1}{n}{\sum }_{i=1}^{n}\left|{y}_{i}-\widehat{{y}_{i}}\right|;\)

Mean Square Error (MSE) \(=\) \(\frac{1}{n}{\sum }_{i=1}^{n}{\left({y}_{i}-\widehat{{y}_{i}}\right)}^{2}\);

Root Mean Square Error (RMSE)\(= \sqrt{MSE}\)

and R-square \(=\) \(\frac{ \frac{1}{n}{\sum }_{i=1}^{n}{\left({y}_{i}-\stackrel{-}{{y}_{i}}\right)}^{2} - \frac{1}{n}{\sum }_{i=1}^{n}{\left({y}_{i}-\widehat{{y}_{i}}\right)}^{2}}{ \frac{1}{n}{\sum }_{i=1}^{n}{\left({y}_{i}-\stackrel{-}{{y}_{i}}\right)}^{2}}\) in which \(\stackrel{-}{{y}_{i}}\) is the mean response. The R-square is valued between 0 and 1. The perfect fitting model is when R-square is close to 1.