Tephra deposit inversion by coupling Tephra2 with the Metropolis-Hastings algorithm: algorithm introduction and demonstration with synthetic datasets

In this work we couple the Metropolis-Hastings algorithm with the volcanic ash transport model Tephra2, and present the coupled algorithm as a new method to estimate the Eruption Source Parameters of volcanic eruptions based on mass per unit area or thickness measurements of tephra fall deposits. Outputs of the algorithm are presented as sample posterior distributions for variables of interest. Basic elements in the algorithm and how to implement it are introduced. Experiments are done with synthetic datasets. These experiments are designed to demonstrate that the algorithm works from different perspectives, and to show how inputs affect its performance. Advantages of the algorithm are that it has the ability to i) incorporate prior knowledge; ii) quantify the uncertainty; iii) capture correlations between variables of interest in the estimated Eruption Source Parameters; and iv) no simplification is assumed in sampling from the posterior probability distribution. A limitation is that some of the inputs need to be specified subjectively, which is designed intentionally such that the full capacity of the Bayes’ rule can be explored by users. How and why inputs of the algorithm affect its performance and how to specify them properly are explained and listed. Correlation between variables of interest in the posterior distributions exists in many of our experiments. They can be well-explained by the physics of tephra transport. We point out that in tephra deposit inversion, caution is needed in attempting to estimate Eruption Source Parameters and wind direction and speed at each elevation level, because this could be unnecessary or would increase the number of variables to be estimated, and these variables could be highly correlated. The algorithm is applied to a mass per unit area dataset of the tephra deposit from the 2011 Kirishima-Shinmoedake eruption. Simulation results from Tephra2 using posterior means from the algorithm are consistent with field observations, suggesting that this approach reliably reconstructs Eruption Source Parameters and wind conditions from deposits.

Estimating ESPs can be treated as an inverse problem (e.g., Tarantola (2005) and Kaipio and Somersalo (2006)), and requires the use of different statistical and engineering techniques (Suzuki and et al 1983;Carey and Sparks 1986;Sparks et al. 1992;Mannen 2006;Klawonn et al. 2012;Klawonn et al. 2014;Maeno et al. 2014;Biass et al. 2016;Poret et al. 2017;Koyaguchi et al. 2017;White et al. 2017;Yang et al. 2019;Mannen et al. 2020). Previous workers have presented different methods to implement inversion to obtain ESPs from the characteristics of tephra deposits, such as deposit thickness and grain size. The simplex search algorithm, grid-search method, matrix inversion with Tikhonov regularization, and a regularized form of the Levenburg-Marquardt algorithm have been proposed (Connor and Connor 2006;Klawonn et al. 2012;Johnston et al. 2012;White et al. 2017; Moiseenko and Malik 2019;Mannen et al. 2020). The efficiency and ability to characterize uncertainty with various simplifications (such as those used to avoid solving ill-posed problems) are the main concerns in proposing these algorithms as alternatives to classical inversion.
The challenges in estimating ESPs derive from their (1) high-dimensionality (i.e., too many variables to be estimated) and (2) limited field observations (e.g., Green et al. (2016)). Because of the ill-posedness of this inversion, it is important to quantify the uncertainty in the process of estimating ESPs such that we know how certain or uncertain we are about our estimate. In addition, it has been shown that ESPs influence model prediction through interaction with other ESPs (Scollo et al. 2008;. In tephra inversion, such interactions or coupling could potentially lead to correlated results, i.e., estimated ESPs are correlated with one another, which is not always taken into account, or studied in a systematic and statistically formal manner. Markov Chain Monte Carlo (MCMC) methods have the ability to quantify inherent uncertainty and address the presence of correlation between variables of interest in the estimate.
In this work, we present and introduce an algorithm that couples the Metropolis-Hastings (M-H) algorithm (Hastings 1970), one of the most widely-used MCMC methods, with volcanic ash transport and deposition model Tephra2 Connor et al. 2011) for the estimation of ESPs of explosive volcanic eruptions. Advantages in using MCMC methods to estimate ESPs under a Bayesian framework include (1) the estimation can be denoted as a posterior probability distribution, which enables uncertainty quantification; (2) prior knowledge on ESPs and field observations can be combined in a statistically formal way to jointly determine the result; (3) correlations among ESPs and between ESPs and wind conditions can be captured by the algorithm in the estimated result; and (4) no simplification is assumed in sampling from the posterior probability distribution, which guarantees that the results are fully Bayesian. Note that these are not unique to the presented algorithm (e.g., White et al. (2017)). See following sections for comparison between different tephra inversion methods.
Because of these advantages, the algorithm presented herein has the potential to better differentiate and characterize sources of uncertainty, and detect insensitive variables of interest in tephra inversion. Uncertainty always exists in tephra inversion regardless of the method being used. Given its constant presence, it is important and better for us to be able to capture and quantify the uncertainty in a statistically formal manner.
We introduce and demonstrate the algorithm in the following way. We introduce the physical model Tephra2 Connor et al. 2011) first. We then briefly explain Bayes' rule, which is what the M-H algorithm (Hastings 1970) solves numerically. An intuitive interpretation of the M-H algorithm is given. Then we describe in detail the construction and implementation of the M-H algorithm and specific setups of the presented algorithm. We apply the algorithm to simulated (synthetic) datasets, i.e., datasets generated from Tephra2 with known ESPs and wind conditions, to validate the algorithm. Three sets of experiments are done with different purposes. Note that the experiments are done to demonstrate that the algorithm is constructed properly for Tephra2. There is no need to demonstrate the validity of Bayes' rule or the M-H algorithm as they are well-studied and known to work well in inversion problems (e.g., Hastings (1970) and Berger (2013)) In the discussion section, main advantages and limitations of the algorithm are pointed out. Correlation in the posterior distribution between variables of interest in our experiments is explained by the physics of tephra transport. Whether a simplified wind profile should be adopted in tephra inversion is discussed. We apply the algorithm to a dataset consisting of observed mass per unit area data of the tephra deposit from the well-studied 2011 Kirishima-Shinmoedake eruption to estimate its ESPs. The results are in general consistent with observations and estimates from previous studies.
The algorithm is coded in python scripts, and is published on vhub (https://vhub.org/resources/4614). To make the work accessible to a broad audience, we minimize the use of mathematical and statistical terms in introducing Bayes' rule and the M-H algorithm. We hope that the algorithm can benefit researchers with interest in estimating ESPs of volcanic eruptions regardless of their backgrounds, and the text can serve as a tutorial to potential users.

Volcanic ash transport model Tephra2
Tephra2 is a widely-used volcanic ash transport and deposition model Connor et al. 2011). It has been coupled with different statistical and engineering techniques for forward and inverse modeling of tephra fall deposits and volcanic hazard analysis (Connor and Connor 2006;Mannen 2006;Volentik et al. 2010;Fontijn et al. 2011;Biass et al. 2012;Mannen 2014;Magill et al. 2015;Biass et al. 2016;Biass et al. 2017;Takarada 2017;Wild et al. 2019;Connor et al. 2019;Mannen et al. 2020;Williams et al. 2020). Tephra2 assumes that tephra particles with different grain sizes are released from a vertical column with column radius increasing with height (accounted for by an additional diffusion term; Suzuki and et al (1983)), and their transport is subject to wind advection, horizontal turbulent diffusion, and falling at terminal velocities. Inputs of Tephra2 include total eruption mass and total grain size distribution, and other parameters to characterize the eruptive column and wind conditions.
Tephra2 gives semi-analytical solution to the advectiondiffusion equation, and its output is the tephra mass per unit area deposited and grain size distribution at user-specified locations. Tephra2 assigns the total erupted mass M 0 to grain size bins (in φ unit) based on the specified grain size distribution. The total mass for each grain size is distributed along the eruptive column (discretized to points). The mass distribution is described by a beta probability density function characterized by α and β.
When tephra particles with grain size φ j released at the height of H i (their total mass: M i,j ) settle and deposit on the ground, the corresponding spatial distribution of mass per unit area (m i,j (x, y)) of the deposit is proportional to a 2D Gaussian function, and can be written as: where (x, y) is the spatial coordinates, and f i,j (x, y) is the 2D Gaussian function with its mean and variance depending on the grain size j, released elevation H i , wind speed and direction, and parameters that characterize turbulent diffusion (e.g., turbulent diffusion coefficient). The total mass per unit area at (x, y) is the sum of Eq. 1 for all particle sizes released from the eruptive column. As the eruptive column, a line source, is discretized into many point sources in Tephra2, the total mass per unit area can be written as: To run Tephra2, ESPs, wind conditions, and locations of interest need to be specified. Tephra2 discretizes the atmosphere into multiple horizontal layers. The number of horizontal layers and their elevations as well as the corresponding wind speeds and directions need to be specified as wind conditions by users to run Tephra2. See more information on the use and implementation of Tephra2 in Connor et al. (2011), Mannen (2014, and Connor et al. (2019).

Bayes' rule
Simply put, Bayes' rule states that our prior knowledge about certain quantities of interest can be updated based on new observations. Assuming that the quantities of interest (i.e., ESPs in this study) is a vector x, the prior knowledge on its value can be denoted as a prior probability distribution P(x). Here the prior distributions should be specified in a way such that they truthfully reflect how certain, or equivalently, how uncertain we are about the values of variables to be estimated. Discussions on the Bayes' rule and how to properly specify the priors could become either too abstract or technical, and are too broad to be within the scope of this work. Philosophical, comprehensive, and detailed discussions on Bayesian statistics can be found in Berger (2013).
With a series of observations θ (i.e., mass per unit area of tephra deposit on the ground in this study), our understanding on x could be updated, which is denoted by the posterior probability distribution (P(x|θ)). Bayes' rule is written as: where P(θ|x) is the likelihood function. It denotes the probability of observing θ given x. P(θ) is the evidence P(θ) = P(θ|x)P(x)dx, and is the total probability of the observations. Here we refer to probability density as probability for convenience. We can assimilate new observations (e.g. tephra observations) to obtain the posterior distribution with the help of the likelihood function. In our case, this cannot be done without running the model Tephra2. We use the simplest case with only one variable of interest unknown, say column height, and one mass per unit area observation (θ * ) to explain the likelihood function. We could apply one value of column height (h * ) to Tephra2, and collect the corresponding output d(x = h * ) with d(·) denoting the Tephra2 output (assume one location of interest in this example). Neglecting model uncertainty, knowing x = h * is equivalent of knowing d(x = h * ). Further assuming that the likelihood function follows a Gaussian distribution, its mean value could be d(x = h * ), and its variance needs to be determined based on our understanding of the data (e.g., the variance should scale with measurement uncertainty; see (Kawabata et al. 2013;Green et al. 2016) for more information on selecting the likelihood function).
If the true column height that generates θ * is 10 km, we expect to see the likelihood function having a greater value if h * is closer to 10 km-the probability of observing θ * is greater when h * is closer to 10 km. In general, the likelihood function should have a greater value, if the model output is similar to the observation. This is also why a Gaussian distribution centered at the model output can be used as one form of the likelihood function. The scale of the likelihood function, which is standard deviation in this example, reflects the scale of measurement uncertainty in the present context. If multiple observations are made, by assuming that each observation is made independently, the likelihood function could be the product of likelihood function for each observation (as adopted in this work).
Here it should be noted that with multiple observations, each observation could have different (assumed) measurement uncertainty (e.g., measured thicknesses of tephra deposits from sediment cores and on land could have different measurement uncertainty; when implementing the inversion based on the amount of each grain size, measured amount of each grain size could have various level of measurement uncertainty). This would change the shape of the likelihood function, and additional attention is needed when constructing the likelihood function in such cases. This is not considered in the present work, but could be adjusted in the algorithm based on specific needs of users.
Constructing the likelihood function properly requires our knowledge on the observation dataset, and the form of the likelihood function could vary case by case. See (Kawabata et al. 2013;Green et al. 2016;Engwell et al. 2013) for more detailed discussion on how to select the form of the likelihood function properly and how to quantify measurement uncertainty of tephra fall deposits.
To obtain the posterior probability distribution, we need P(x), P(θ), and the likelihood function P(θ|x). The prior distribution P(x), the likelihood function P(θ|x), and P(θ) (by definition) need to be defined beforehand based on prior knowledge about the ESPs and measurement uncertainty.
The major difficulty in analytically deriving the posterior distribution comes from the fact that the likelihood function would become almost certainly non-parametric in practice. This would make the value of P(θ) hard to calculate (although it is a constant), and is related to practical issues such as the high dimensionality of the parameter space. Therefore, the posterior distribution is frequently obtained through numerical sampling methods.

The M-H algorithm
MCMC methods, a class of methods that draw samples from a target distribution, can be used to sample from the posterior distribution based on P(θ|x)P(x), the numerator of the right-hand side of Eq. 3. In this way, the difficulty in calculating P(θ) can be avoided. In volcanology, MCMC methods have been widely adopted for various purposes, such as estimating parameters and initial conditions of a physical model (which is similar to the goal of the present work), determining ages of volcanic events, and hazard forecasting (e.g., Green et al. (2016) Wang et al. (2020), and Liang and Dunham (2020)). (Green et al. 2016) used one MCMC method to estimate volumes of tephra fall deposits based on sparse and incomplete observations, and their work used a semi-empirical model to characterize tephra thickness distribution.
The procedure adopted in this work, the M-H algorithm, is one of the most widely used MCMC methods (Hastings 1970). A brief introduction to the algorithm is given below. More information about the algorithm can be found in textbooks and published articles (e.g., Chib and Greenberg (1995), Andrieu et al. (2003), and Kaipio and Somersalo (2006)).
The M-H algorithm draws a series of sample points following certain rules. Each sample point corresponds to one set of ESPs and wind conditions (one value for each variable) that are to be estimated. These rules are determined by the prior distribution and the likelihood function. With sufficient draws, it is guaranteed that the distribution of the drawn samples converges or approximates the target probability distribution, namely the posterior distribution in our case, regardless of the starting sample point. How the algorithm works can be generally described as follow.
With a random starting point x 0 (i.e., the first sample) and corresponding observations θ, the algorithm proposes a new point x * 1 using a proposal function that is known and easy to sample. In this work, we use one of its most common forms, a Gaussian probability density function centered at the previous point (i.e., x 0 for the first draw). The variance of the proposal function needs to be defined subjectively which will affect the efficiency of the M-H algorithm, and will be illustrated in later experiments. Proposing a new point x * 1 is thus equivalent of drawing one sample from a Gaussian probability distribution.
By calculating and comparing P(θ|x 0 )P(x 0 ) with P(θ|x * 1 )P(x * 1 ), the algorithm decides whether to accept or reject x * 1 . Note that we need to know values of the prior (P(x 0 ) and P(x * 1 )) and likelihood function (P(θ|x 0 ) and P(θ|x * 1 )) to do the comparison. The latter requires us to implement Tephra2 to obtain values of the likelihood function.
If x * 1 is rejected (the rejection rule is introduced in the next paragraph), x 1 = x 0 . Otherwise, x 1 = x * 1 . The two procedures, namely drawing a new sample point and rejecting or accepting it, iterate, and after sufficient iterations, a chain of vectors x 0 , x 1 , ..., x n is obtained. By (1) excluding points with relatively small index (e.g., x 0 ,.., x 499 ), and (2) taking points with a fixed interval along the chain (e.g., only taking x 500 , x 600 , ..,x 9900 , x 10000 ), the target posterior distribution is obtained through sampling. The first measure is to make sure that the results are not affected by the value of the starting point (x 0 ), and the second is to avoid auto-correlation in the chain (see Chib and Greenberg (1995), Andrieu et al. (2003), and Kaipio and Somersalo (2006) for more details).
Whether to accept or reject a proposed point follows the rules below: • If P(θ|x * 1 )P(x * 1 ) > P(θ|x 0 )P(x 0 ), then the posterior probability is greater at x * 1 , and the proposed point will be accepted. That is, if the proposal has a higher posterior probability it is automatically accepted. Following this rule ensures that there will be more samples with greater posterior probability in the chain.
• If P(θ|x * 1 )P(x * 1 ) < P(θ|x 0 )P(x 0 ), then the algorithm accepts x * 1 with probability P(θ|x * 1 )P(x * 1 )/P(θ|x 0 )P(x 0 ). This allows the algorithm to occasionally sample points with low or relatively low posterior probability, in order to explore the entire possible domain of x. Following this rule implies that: -If P(θ|x * 1 )P(x * 1 ) is a lot smaller than P(θ|x 0 )P(x 0 ), the posterior probability at x * 1 is small, and should be accepted with low probability. This ensures that there will be fewer points with low posterior probability in the chain.
-If P(θ|x * 1 )P(x * 1 ) is only slightly smaller than P(θ|x 0 )P(x 0 ), the probability of accepting x * 1 is relatively greater. In such a case, the algorithm encourages (with high probability of acceptance) keeping x * 1 and further exploring points around (sharing similar values with) x * 1 .
The second rule is critical to the M-H algorithm. Instead of searching for the values that maximize the posterior probability (maximum a posteriori estimation), the algorithm functions through sampling from the target distribution.

Form of the likelihood function
In the present version of the algorithm, it is assumed that the likelihood function for each observation follows a Gaussian distribution with variable log 10 ( observation model output ). The mean of the distribution centers at 0 such that the likelihood function peaks when the observation is identical to the model output. This setup is consistent with previous works. It states that measurement uncertainty scales with magnitude of the observation (e.g., Connor and Connor (2006), Kawabata et al. (2013), and White et al. (2017)). The standard deviation or scale of the likelihood function, which reflects measurement uncertainty, needs to be specified by users. Its effect on the results of the algorithm will be examined in the following experiments. It should be noted that the form of the likelihood function could be changed to different forms in the algorithm.

Two ways to specify the wind profile
The algorithm allows for two ways to specify and estimate the wind profile, and users could decide which way to use, and how to construct the parameterization (i.e., determine what variables are known, and what are to be estimated for the wind profile). In cases with limited observations, it is challenging to estimate the wind speed and direction at each elevation (see White et al. (2017) for successful examples). In such cases, a simplified form of the wind profile can be adopted in the algorithm. It assumes that (1) the wind direction is constant, and does not change with elevation, and (2) wind speed increases from zero to a certain (a maximum wind speed) value with elevation, and then decreases to zero with elevation linearly. Four variables define such a simplified wind profile. They are the wind direction, maximum wind speed, elevation that corresponds to the maximum wind speed, and elevation that the wind speed reaches zero. Users could specify which of the four variables are to be estimated. Prior distributions of these variables also need to be specified if they are to be estimated. This wind profile setup is similar to the wind speed profile adopted by Carey and Sparks (1986).
In the second way to specify the wind profile, users could determine whether wind speed and direction at each elevation level are known (i.e., do not need to be estimated and kept fixed in implementing the algorithm) or not (i.e., need to be estimated). If the wind speed and direction at certain or all elevation levels need to be estimated, users need to specify the prior for each of these variables. For recent eruptions, users could adopt best-fit historically observed wind profiles, or use them to construct the priors for wind conditions with the non-simplified wind profile. For example, NOAA NCEP/NCAR REANALYSIS data (Kalnay et al. 1996) provide a large number of wind profiles globally, based on best-fit wind profiles to observations via weather models (e.g., Connor et al. (2019) and Mannen et al. (2020)).

Running the algorithm
To run the algorithm, users first need to prepare input files. Examples of input files can be found in Table 1. They have five columns including the "variable name" column. The "initial value" column specifies the starting value of each variable. If the value of one certain variable is known (not to be estimated), its initial value will be fixed during the implementation of the algorithm, and the "prior" column needs to be marked as "Fixed". For such variables, users could leave the last three columns blank. To estimate a non-simplified wind profile, users must specify the wind direction and speed at the specified elevation levels, depending on the form of the assumed profile.
For variables to be estimated, their initial values do not affect the results as long as the specified number of draws is sufficiently large. The only requirements for the initial values are that they need to be physically possible, and their values would not lead to an error when running Tephra2. In practice, having the initial values located within a meaningful region (for example, means of the prior distributions) could effectively avoid exploring (sampling) values that are less likely to be accepted. Forms of prior distributions for these variables need to be defined in the "prior" column. The current version of the algorithm supports "Gaussian" and "Uniform" distributions. Including more forms (e.g., log-normal distribution and bounded Gaussian distribution) of the prior distribution in the algorithm will be one goal of our future work for its improvement. If the former is specified, columns "param-eter_a" and "parameter_b" should be filled with the mean and standard deviation of the prior distribution, respectively. Otherwise, the two columns correspond to the minimum and maximum of the uniform prior distribution. The last column "draw_scale" specifies the standard deviation of the proposal function for each variable to be estimated.
After the input files are prepared, users need to run all python scripts in order to execute the algorithm. Other than setting up proper file paths to read input files and observed data, and store the results, users only need to specify two values, namely the scale of the likelihood function and the number of draws (length of the chain), to run the algorithm. How to determine the number of draws is universally challenging when working with MCMC methods. It is preferred to have a large number of draws Table 1 Examples of input files of the algorithm. Tables on the left and right correspond to input files that specify the ESPs and simplified wind profile, respectively so that measures can be done to reduce autocorrelation in the sampled chain (e.g., leave out samples at the beginning of the chain and taking samples along the chain with a fixed interval as the final sample distribution; see (Chib and Greenberg 1995;Andrieu et al. 2003;Kaipio and Somersalo 2006) for more details), and the results are less likely to be divergent. Here we recommend the number of draws to be ranging from five thousand to one million. This is based on our experiments with the algorithm. The basic principle is that more number of draws are necessary if more variables are to be estimated. This is because with more variables to be estimated, the dimensionality of the input space increases, and more draws are needed to explore the input space (i.e., a lot more possible combinations of input variable values need to be explored). One million draws are indeed not trivial, but given the low computational cost, having one million runs with Tephra2 is not impractical.
Users could check for convergence by running two or more separate runs with identical inputs except for the starting values. If sample distributions from the runs are similar to each other, results from them converge. Otherwise, users need to increase the number of draws, and run the algorithm and check again following the same procedure. See (Andrieu et al. 2003) and references within for more information on how to implement the M-H algorithm properly.
The primary output from the algorithm is the sampled chain. If 10000 draws are specified, the sampled chain will be a 10000-by-23 (19 variables for the ESPs plus 4 variables for the wind profile) matrix if a simplified wind profile is adopted. Otherwise, with the non-simplified wind profile scenario, the result will be presented as three separate matrices for the eruptive column (10000-by-19 matrix) and wind speed and direction at each elevation (two 10000-by-40 matrices if the specified elevations are 1000, 2000,...,40000 m), respectively. Variables with known and fixed values will remain constant in the corresponding columns.
The algorithm also produces the log-transformed values of the prior probability and likelihood function of the proposed samples at each draw, log-transformed posterior probability for each accepted sample on the chain, and the number of acceptances from the run as outputs. They could potentially help users to debug and adjust parameters to run the algorithm. The ESPs and wind conditions with the highest posterior probability are mostly likely to reproduce observations. Their values can be found by examining histograms of resultant samples or finding their medians.
The running time of the algorithm depends (roughly linearly based on our experiments) on the specified number of draws, the running time of Tephra2, and the number of observations. On an iMac with a 3 GHz Intel Core i5 processor, implementing the algorithm with ten thousand draws and 30 observations takes 394 seconds (∼6.5 minutes). More details on how to implement the algorithm can be found in the documentation file of the algorithm on vhub (https://vhub.org/resources/4614; ).

Results
In this section, we generate simulation data using Tephra2. The simulated data are treated as field measurements for testing and validation of the algorithm. It should be noted that the goal of this work is to present and validate this algorithm and introduce and explain how it works. There is no need to question Bayes' rule and the adopted M-H algorithm. We avoid using real thickness or mass per unit area datasets of tephra fall deposits in testing the algorithm. This would avoid additional sources of uncertainty from affecting the results (e.g., model uncertainty and variable level of post-eruption erosion and compaction of tephra deposits).
It is difficult to validate the algorithm even with synthetic datasets, because (1) outputs of the algorithm are posterior probability distributions, not individual optimum values. Because of this, it is difficult to tell that the method works in a single experiment: it is perhaps easier to tell whether the posterior mean is estimated correctly or not, but how can we check for uncertainty? It is difficult to calculate the posterior standard deviation analytically; (2) in complex scenarios (i.e., a lot of variables are to be estimated), it is possible that posterior distributions cannot be updated from the priors. For example, when the variable to be estimated is not sensitive to model output. In such cases, the not-updated posterior distribution is the right answer, because our prior knowledge on this variable cannot be updated based on new observations; (3) Whether the prior and posterior distributions are poorlyor well-constrained is always relative, which depends on numbers of observations and variables to be estimated and measurement uncertainty.
Therefore, we think that comparison is key to demonstrating the validity of the algorithm. By comparing whether the posterior uncertainty changes accordingly with the inputs and with expectations from Bayes' rule, we examine whether the posterior uncertainty is estimated correctly. For example, if we increase the assumed measurement error in an experiment, we would expect to have a greater posterior uncertainty for the ESPs to be estimated in the result. However, because of the second difficulty in validating the algorithm listed above, the comparison would only be significant and discernible in simple scenarios (i.e., fewer variables to be estimated).
With these concerns, three sets of experiments are done with different purposes. In the first two sets of experiments, the algorithm with the simplified wind profile is adopted. The experiment in Set 3 works with the non-simplified wind profile.
The first set of experiments is designed to test when given different inputs, whether the resultant posterior distributions from the algorithm would behave consistently with expectations from Bayes' rule, and to illustrate how specifications of inputs to the algorithm affect the results (sampled posterior distributions). We thus intentionally keep the scenarios in Set 1 experiments simple, i.e., estimating column height and total eruption mass with ten or thirty observations. In this way, we know how the posterior distributions behave given different inputs (e.g., different data quality and priors) based on Bayes' rule. In these simplified scenarios, we exclude potential impacts from the model (e.g., whether certain variable is sensitive to the output or not) on the estimated posterior distributions. Therefore, albeit simplified, Set 1 experiments are considered necessary and strict measures to demonstrate that the algorithm is constructed properly. In these simplified experiments, we also adjust the number of input observations (dataset size) and sample site locations to showcase that the algorithm behaves consistently with the Bayes' rule in different scenarios.
In Set 2 experiments, we estimate posterior distributions of eight variables with poorly-constrained priors (except for column height as it is investigated in Set 1 experiments). These scenarios are more comparable to a real-world problem of tephra inversion.
The experiment in Set 3 is done to show that (a) the algorithm is able to estimate wind speeds and directions at different elevations when the problem is known to be solvable (with the non-simplified wind profile); and (b) the estimation is not easy due to the number of variables to be estimated (i.e., high dimensionality of the input space) for the algorithm. Details and results of the three sets of experiments are given below.

Set one experiments Experiment setup
For Set 1, twelve experiments are done. We set column height and total eruption mass as our variables of interest, and their true values are 15000 m and 1.88 × 10 11 kg (25.96 = log(1.88 × 10 11 ) log-transformed kg), respectively. Here we use the natural logarithm, but it can be easily adjusted in the present version of the algorithm based on the need of its users. The choices would not affect any results or conclusions in the following experiments. Values of other ESPs and wind conditions used to run Tephra2 to generate the "observations" are assumed to be known (i.e., fixed values in the experiments) and listed in Table 2, and are fixed in all experiments in Set 1. The simplified wind profile is adopted throughout the 12 experiments. Given purposes of Set 1 experiments, focusing on just two variables makes it easier for examining and interpreting the results. The first experiment, Experiment # 0, is used as reference for comparison with results from the rest. For the rest experiments, we just modify one input of the algorithm or the observation dataset, and keep the others the same as they are in Experiment # 0. In this way, the impact of each factor on the performance of the algorithm can be isolated and highlighted. Specified inputs of the 12 experiments as well as changes made in each experiment are highlighted in Table 3.
For the 12 experiments, the M-H algorithm is set to draw 10000 points (10001 points in the chain including the starting point). After each experiment is finished, we post-process the results by taking the first 1000 points out of the chain, and collecting samples from the rest of the chain by a 15-points interval (only taking 1015th, Table 2 ESPs and wind conditions used to generate "field observations" for validation of the algorithm. ESPs for Sets 1 and 2 experiments with the simplified wind profile (orange-striped cells) and the Set 3 experiment with the non-simplified wind profile (blue-striped cells), respectively Table 3 Specifications and input observation data used to run Experiments # 0-11 of Set 1. Differences in the specification or input observation data in each experiment compared to Experiment # 0 (as reference experiment; marked as green cells) are highlighted in yellow 1030th, ..., 10000th points in the chain) and discarding all other points. These measures are adopted to prevent the results from being affected by the initial starting point and autocorrelation as mentioned earlier. For two variables of interest, a chain with 10000 draws is sufficient enough for the samples to converge to the posterior distribution. Summary of the sampled posterior distributions is given in Table 4.

Reference experiment
In Experiment # 0, ten observations are sampled at localities far from the source vent downwind (∼ 14 − 30 km from the vent ; sample sites shown in Fig. 1a). Their values range from 50-383 kg/m 2 . Gaussian distributions are assumed for the column height and log-transformed total eruption mass. Means and standard deviations are 16000 m and 2000 m for column height, and 25.96 and 2 log-transformed kg (1.88 × 10 11 kg) for eruption mass. It is noted that 16000 m is slightly greater than the specified column height used to generate the dataset, and the prior mean for the log-scaled eruption mass is identical to the true value (Table 3). The scale of the likelihood function is set to be 0.05, which corresponds to ∼ 11.6% of relative measurement error (this can be calculated based on how the likelihood function of the algorithm is defined; see text above).
The results are shown and summarized in Table 4 and

Effects from specifications required by the m-H algorithm
Scale of the likelihood function In Experiment # 1, we increase the scale of the likelihood function from 0.05 to 0.2 (Table 3), which corresponds to 53.4% of relative measurement error. A greater scale of the likelihood function implies that a greater measurement uncertainty is assumed (neglecting model uncertainty). From Bayes' rule, we know that compared with results from the reference experiment, posterior distributions of Experiment # 1 would not be greatly updated.
As expected, the resulting sampled posterior distribution of column height from Experiment # 1 is not significantly different from its prior. Its posterior mean and standard deviation are 15990 m and 1775 m, respectively (prior mean and standard deviation: 16000 m and 2000 m). In this experiment, observations are made, but because a larger measurement uncertainty is assumed for them, the algorithm does not trust these measurements as it does in the reference experiment. This is also manifested in the acceptance rate of Experiment # 1, which is 84.0%. What the M-H algorithm does here is basically accepting or rejecting points based on the prior distribution, and the likelihood function cannot help determine whether to accept or reject the proposed samples due to the greater scale specified (i.e., greater measurement uncertainty).
The posterior distribution of the log-transformed total eruption mass still centers close to the true value with lowered standard deviation compared to its prior, suggesting that estimating total eruption mass is less sensitive to measurement uncertainty. This is consistent with the argument from (Scollo et al. 2008) which states that total eruption mass is a crucial eruption parameter that would greatly affect Tephra2 outputs by itself.

Scale of the proposal function
In Experiment # 2, scales of the proposal function are increased from 500 m and 0.05 log-transformed kg to 2000 m and 0.2 logtransformed kg for column height and log-transformed total eruption mass, respectively (Table 4). With greater scales of the proposal functions, the algorithm is more likely to propose a new point that is far from the current point. We know from the theory of the M-H algorithm that this would not affect the resultant sampled posterior distributions. This is confirmed through comparing results in Experiments # 0 and 2 which are similar to each other (Table 4).
Nonetheless, scales of the proposal functions would affect the efficiency of the algorithm and sometimes its performance when the number of draws is not large enough. The algorithm is being too "adventurous" with proposal functions characterized by greater scales: they tend to explore (propose) values that are greatly different from the current point. Such values are less likely to be accepted especially when the current values are characterized by high posterior probability (acceptance rate: 13.6% for Experiment # 2). The greater probability of rejection reduces the efficiency of the algorithm. Users could adopt the suggested measures (introduced in previous section) to check for convergence first. If the results converge, there is no need to worry about the greater probability of Based on Bayes' rule, we know that with the overconfidence in specifying the priors in Experiments # 3 and 4, the ten observations are probably not sufficient enough to "drag" the posterior distributions to be centered at the true values. We expect to see the posterior distributions of one or both of the variables to be centered in between the true values and means of the specified priors.
The results are consistent with this expectation, and show that the ten observations are not powerful enough to correct both priors in the two experiments. Posterior means of the two experiments are 14283 and 12666 m for the column height (std: 442 and 459 m), and 25.997 and 26.138 log-transformed kg for the log-scaled total eruption mass (std: 0.048 and 0.059 log-transformed kg), respectively. Experiment # 3 with the specified prior closer to the true value has its posterior distribution of column height closer to the true value compared to Experiment # 4. The "incorrect" results in the two experiments represent what we expect to see based on Bayes' rule, and support the validity of the algorithm.
Posterior distributions of total eruption mass in both experiments are centered near the true value of logtransformed total eruption mass. This again is consistent with the interpretation about the total eruption mass from (Scollo et al. 2008).

Number of input observations
Experiments # 5-9 share the same specifications with Experiments # 0-4, respectively, except that 20 more samples are included in the input. The 30 measurements (see Fig. 1 for sample localities) range from 32-383 kg/m 2 . The results are compared pairwise, and summarized in this section. With more observations, we expect to see the posterior distributions being improved (either with reduced uncertainty or the posterior means closer to the true values) compared to Experiments # 0-4 with fewer observations.
Comparison between Experiments # 0 and 5 shows that they have similar posterior means that are consistent with true values of column height and log-transformed total eruption mass. The corresponding standard deviations are smaller in Experiment # 5 (685 m and 0.039 logtransformed kg for column height and log-scaled eruption mass). This confirms that more observations reduce the uncertainty in the posterior distribution. The same argument can be made for Experiment # 7 as it has the same specifications as Experiment # 5 except for greater scales of the proposal functions (which would not affect the posterior distribution, and is discussed in Experiment #2).
Even with more observations, the posterior distribution (mean: 15786 m; std: 1633 log-transformed kg) of column height in Experiment # 6 (which assumes a greater measurement error) is not greatly updated from the prior. This is again due to the greater likelihood scale specified in Experiment # 6. The assumed uncertainty in the measurement is too large that 30 observations are not sufficient enough to greatly update the prior.
Experiments # 8 and 9 with incorrect and confident priors have their column height posterior means (14483 and 13340 m) slightly closer to the true value compared to results from Experiments # 3 and 4 (14283 and 12666 m). Posterior means of log-transformed eruption mass in the two experiments are close to the true values (25.986 and 26.070). It can be seen that more observations "drag" the posterior distributions of column height towards the true value in the two experiments. Results from experiments # 3, 4, 8, and 9 reflect the "wrestling" between incorrect prior knowledge and informative observations. The posterior distributions in Experiments # 8 and 9 are also characterized by lower standard deviations (Table 4). These all conform with our expectations based on Bayes' rule and previous study (Scollo et al. 2008) on the sensitivity of total eruption mass in Tephra2.

Sample site locations
In Experiments # 10 and 11, ten measurements at proximal (and closely-spaced) and medial localities are used as input observations, respectively (see sites in Fig. 1a). Here the proximity is defined relatively, but this would not affect any results or conclusions in these experiments, as our argument that the algorithm is constructed correctly is demonstrated through comparison. We only consider cases with sample sites located in the downwind area with respect to the source vent. This is because the main goal of this experiment is to present and introduce the algorithm, not to explore under what conditions it would be hard for the algorithm to effectively update the prior.
In Experimenet # 10, the sample sites (yellow points in Fig. 1a) are all clustered and located close to the vent. In theory, they cannot provide too much useful information for the Bayes' rule to greatly update the posterior distributions, because the sample sites are too close to each other, and hence the observations are similar to each other. This is consistent with the results: the sampled posterior distributions from Experiment # 10 are not greatly updated from the priors. The posterior mean and standard deviation of the column height (16085 m and 1863 log-transformed kg, respectively) are similar to those of the prior (Table 4). The posterior mean of the logtransformed eruption mass is 26.106 log-transformed kg, and is characterized by a relatively greater standard deviation (0.280 log-transformed kg) compared to results from other experiments.
In Experiment # 11, the sample sites are spatially medial to the source vent, i.e., neither especially close nor far relative to the entire span of distances (pink points Fig. 1a). The sites are distributed in a more scattered pattern than are those in Experiment # 10. These observations provide more useful information than do the proximal observations (Experiment # 10), which can be used to update the priors. This is consistent with results of Experiment # 11. Table 4, the posterior means of the column height and log-scaled eruption mass are consistent with the true values, and the posterior distributions are also characterized by smaller standard deviations compared to those in Experiment # 10 and the specified priors.

As shown in
Resultant posterior distributions from Set 1 experiments are consistent with our expectations based on Bayes' rule, suggesting that the presented algorithm is constructed properly for Tephra2 in these simplified scenarios. We hope that these experiments could also help potential users understand how and why inputs of the algorithm affect its performance.

Set two experiments
In Set Two, two main experiments plus two supplementary experiments are implemented. The two main experiments are done to show that the algorithm is able to estimate a set of ESPs and wind-related variables. In the two main experiments, we find that posterior distributions of diffusion coefficient and column height are similar to their priors. Two supplementary experiments, which will be introduced later, are thus proposed to show that the variable diffusion coefficient can be well characterized in simpler scenarios. For column height, the posterior distribution is not updated because a relatively strong prior is specified.
In the two main experiments, we estimate eight variables of interest with the simplified wind profile: column height, total eruption mass, α (which characterizes tephra mass distribution along the column; β is fixed to be 2 in all Set 2 experiments; see Table 2), median and standard deviation of total grain size distribution, diffusion coefficient, wind direction, and maximum wind speed. We do not attempt to estimate more ESPs because other ESPs are in many cases well-constrained (e.g., vent coordinates and elevation, maximum and minimum grain size considered). The only difference between the two main experiments is that different levels of relative measurement error are assumed, which are 28.9% (Experiment # 1; likelihood scale: 0.12) and 14.0% (Experiment # 2; likelihood scale: 0.06), respectively. These values are set arbitrarily, but since the goal to have two experiments (two measurement errors) is to see whether they would affect the resultant posterior distributions, these arbitrary decisions would not affect the motif herein.
For all four experiments (two main experiments plus the two supplementary ones) in Set Two, the ESPs and wind conditions used to generate the "observations" are the same as in Set One experiments, and are listed in Table 2. All sample sites shown in Fig. 1a are the assumed sample sites, and Tephra2 outputs at these sites are assumed to be the "observations". We assume that we have little knowledge on the ESPs and wind conditions that are to be estimated in the two main experiments except for column height, which has a prior Gaussian distribution with mean and standard deviation being 15000 and 2000 m, respectively. This prior is relatively well-constrained in Set 2 main experiments given the number of variables to be estimated and the amount of observations. All priors are listed in Table 5.
In each experiment, five hundred thousand samples are drawn. The first 10000 samples are abandoned, and the sample interval is set to be 50 to avoid auto-correlation (for each experiment, three runs are done to check for convergence). The same implementation is done for the supplementary experiments.
Means and standard deviations of resultant posterior distributions in the two main experiments of Set Two are listed in Table 5. Most posterior means are at or close to their true values with greatly reduced uncertainty compared to the corresponding priors. The exceptions are the diffusion coefficient and column height. For the former, the corresponding posterior means are 6647 and 6248 m 2 /s (true value: 5000 m 2 /s) with standard deviations of 1877 and 1524 m 2 /s, respectively, in the two experiments. Its sampled posterior distributions also resemble a uniform distribution, which is how the prior is specified.
The posterior distribution of column height is not greatly updated from the prior in the first main experiment with a greater assumed measurement error (posterior mean and std: 14828 and 2025 m). For the second experiment with a smaller measurement uncertainty, the corresponding posterior standard deviation becomes slightly smaller (posterior mean and std: 14550 and 1690 m).
As a smaller measurement error is assumed in the second main experiment of Set 2, based on Bayes' rule, we expect to see better-constrained posterior distributions (i.e., with lower posterior standard deviations) in that experiment. This is confirmed in our results. As shown in Table 5, the posterior standard deviations in the second experiment of Set 2 are all smaller compared to those from the first experiment. This is consistent with Bayes' rule.
We also list posterior correlations between variable pairs from the two main experiments in Table 6. It can be seen that several variable pairs are characterized by high-magnitude correlation, suggesting that the interaction between ESPs and wind conditions is critical, and should not be ignored in tephra inversion.

Supplementary experiments
From the two main experiments, we find that variables diffusion coefficient and column height are relatively hard to constrain. Two additional, supplementary experiments are thus done to prove that the variable diffusion coefficient can be well-estimated in simpler scenarios. We do not experiment with column height here as experiments Table 5 True values of the ESPs and wind conditions to be estimated, their assumed priors, and means and standard deviations of the resultant posterior distributions in Set 2 experiments in Set 1 have proved that it can be well-estimated in simpler scenarios.
We decide to estimate α (which characterizes tephra mass distribution along the column), diffusion coefficient, wind direction, and maximum wind speed of the simplified wind profile in the supplementary experiments. We assume that all other ESPs and wind conditions are known. The priors of these variables are the same as in the two main experiments, and are shown in Table 5. The only difference between the two is the assumed relative measurement errors, which are 13.5% and 6.7%, respectively. Again, it is through comparison that we validate the algorithm. Exact values of the two would not affect the motif herein. Similarly, the other three variables are chosen arbitrarily, but would not affect conclusions or results in these experiments given the goal of the supplementary experiments.
Posterior means and standard deviations of the variables to be estimated are shown in Table 5. It can be seen that variables other than the diffusion coefficient are well-constrained in both experiments. In terms of diffusion coefficient, the supplementary experiment with a greater measurement uncertainty has its posterior distribution centered at 5403 with a standard deviation of 1142. In the experiment with a smaller assumed measurement uncertainty, the resultant posterior distribution finally becomes close to the true value (posterior mean: 5078), and the posterior standard deviation is also smaller (543) compared to the other supplementary experiment.
Results in the two supplementary experiments suggest that the variable diffusion coefficient can be wellestimated in simpler scenarios. Comparing posterior distributions from the two suggests that its posterior distribution is hard to constrain given limited observations compared with the other ESPs.
Together with the fact that the M-H algorithm is a generalized probabilistic inversion algorithm, results from Table 6 Posterior correlation table for results from the two main experiments in Set 2. Blue and pink cells correspond to results from the first (with greater assumed measurement error) and second (with lower assumed measurement error) experiments, respectively. Correlations with magnitude above 0.5 are marked Sets 1 and 2 experiments suggest the validity of the presented algorithm.

Set three experiment with non-simplified wind profile
Results from one experiment with non-simplified wind profile are presented in this section. The ESPs and wind conditions to generate the synthetic data are shown in Table 2. Change in wind speed and direction with elevation is taken into account in this experiment. The wind speed is specified to increase from 0 on the ground to a maximum of 70 m/s at 16 km, and then decrease to 10 m/s at 24 km. The wind direction is to the north on the ground, and gradually changes to eastwards with elevation.
We set our ESPs of interest to be column height and logtransformed total eruption mass. For the wind profile, we choose to estimate wind directions and speeds at elevation levels 5, 14, and 18 km a.s.l. Wind speeds and directions at other elevations are assumed to be known. This amounts to eight variables: the two ESPs and wind directions and speeds at the three elevation levels.
We do not choose to estimate wind direction and speed at all elevation levels (i.e., estimate the complete wind profile) because that would significantly increase the number of variables to be estimated. In such circumstances, the problem could become hard to solve as the increased dimensionality of the input space makes it hard for the algorithm to draw samples efficiently. At the same time, we know that wind speed and direction only affect wind advection in Tephra2. This means that different combinations of wind speeds and directions at various elevations could lead to similar simulated results in Tephra2, and that they could remain relatively independent from other ESPs in many cases. This is comparable to vector decomposition: one vector (the total distance each tephra particle travels due to advection in Tephra2) can be decomposed to the sum of infinite combinations of two or more vectors (distances each tephra particle travels within two or more elevation layers).
Therefore, an experiment that estimates the wind direction and speed at each elevation cannot be used to justify the method works even if sufficient draws are made. With poorly-constrained priors, we know that the posterior distributions cannot be greatly updated (because we know that a lot of local minima exist, and we are given limited observations given the number of variables to be estimated). On the other hand, we cannot tell whether the priors are specified properly: given the high dimensionality, it would be hard for us to know whether the priors are specified too close to the real wind profile or not. Here, the current experiment is designed to show that the method is able to estimate wind speeds and directions at several elevations when the problem is known to be solvable (i.e., relatively fewer variables are to be estimated). More discussion on the use of simplified and non-simplified wind profiles in tephra inversion is given in the following section.
Elevation levels 5, 14, and 18 km are chosen because wind directions at these elevations are 25°, 65°and 85°, respectively. We "collect" tephra mass per unit area at 495 randomly selected sample sites (Fig. 1b). The dataset size is greater than commonly-seen thickness or mass per unit area datasets of tephra fall deposits. Estimating wind speed and direction at a few elevations and having 495 observations are not realistic in studies on tephra fall deposits. However, these are adopted such that we know that the problem is solvable, and can be used to validate the algorithm. The number of "observations" is chosen arbitrarily, but the value of this number (whether it is 500 or not) would not affect any results or conclusions from this experiment, because what we need is sufficient amount of observations. Any number that is above ∼350 would work.
Other ESPs, and wind directions and speeds at other elevations, are kept fixed throughout the implementation of the algorithm. Fifty thousand runs are done for three times with different starting points. In each run, the first 2000 runs are discarded to avoid auto-correlation, and further subsetting with an interval of 50 points along the sample chain are subsetted as the final results. The resultant sample posterior distributions from the three runs is similar, suggesting that the results converge. The posterior mean and standard deviation for each variable of interest are listed in Table 7. They are highly consistent with specified values used to generate the dataset, and the posterior standard deviations are smaller than those of the priors. The results suggest that the algorithm functions when a non-simplified wind profile is adopted and the number of variables of interest limited. Greater uncertainty is obtained for wind direction and speed at higher elevations (i.e., 14 and 18 km). This is due to the fact that the wind speed at higher elevations is generally greater, and the estimated uncertainty scales with it.

Discussion
In this work, we introduce an algorithm coupling the ash dispersal model Tephra2 with the Metropolis-Hastings implementation of MCMC. We validated it with synthetic data generated by Tephra2. By varying the inputs to the algorithm and observation datasets one at a time, we examine and explain how they affect the performance and efficiency of the algorithm. Three sets of experiments are done. The first set focuses on simple scenarios (i.e., with simplified wind profile) with two variables of interest (i.e., total eruption mass and column height) given ten or thirty observations. In these experiments, the algorithm is shown to work well, and has the ability to quantify the uncertainty in the estimate. Resultant posterior Table 7 True values, specified prior types and parameters, and posterior means and standard deviations for the Set 3 experiment with the non-simplified wind profile distributions from these experiments are consistent with expectations from Bayes' rule.
In the second set of experiments, we focus on estimating posterior distributions of six ESPs and two wind-related variables. The results suggest that posterior distributions of most of the variables of interest are greatly updated from their corresponding priors. While posterior distributions of column height and diffusion coefficient are similar to their priors, respectively. This is because a relatively strong prior is specified for column height, and it is harder to constrain the posterior distribution of diffusion coefficient. The supplementary experiments in Set 2 suggest that diffusion coefficient can be well estimated in simpler scenarios.
In Set 3, we set out to estimate two ESPs, and wind directions and speeds at three elevation levels given sufficient observations. In this experiment, wind direction and speed are set to vary with elevation. The results suggest that the algorithm could work well with a non-simplified wind profile.
Our discussion here focuses on advantages and limitations of the algorithm, interpreting the posterior correlation between column height and total eruption mass in our experiments, and whether we should attempt to estimate wind direction and speed at each elevation in tephra inversion when working with Tephra2. The algorithm is then applied to the mass per unit area dataset of the 2011 Kirishima-Shinmoedake tephra deposit to infer the corresponding ESPs.

Advantages and limitations
The main advantages of the algorithm are that it makes use of prior knowledge on a deposit and eruption, and quantifies the uncertainty in the estimate of ESPs in a statistically formal manner.
In studies on tephra fall deposits, previous knowledge plays a critical role in determining the ESPs and reconstruction of volcanic eruptions (Sparks et al. 1997;Mastin et al. 2009). Such knowledge has uncertainty within it. How to properly incorporate such uncertainty in the estimated results is challenging without a probabilistic Bayesian framework. With the algorithm, prior knowledge about the studied deposit and eruption, and their associated uncertainties, is denoted as the prior probability distribution, and incorporated in the estimate. Practically speaking, prior knowledge is being used consistently throughout the implementation of the algorithm. That is, the prior probability helps determine whether to accept or reject a proposed point in each draw of the algorithm. However, for a non-probabilistic inversion method, such as gradient methods, prior knowledge might be used only once in the inversion process-it helps determine the starting point.
In tephra deposit inversion, uncertainty in the estimated ESPs comes from the interplay of multiple sources, which include the uncertainty in the prior, measurement uncertainty, and potential model uncertainty. Non-probabilistic inversion method helps us find the optimum ESPs that fit well to field observations, but the uncertainty cannot be quantified.
The algorithm samples from the posterior distribution without any presumptions. This means that (1) its results are fully Bayesian; and (2) more flexibility is given to the users, as they could change forms of priors and likelihood functions based on their own needs (either by choosing the available options in the present version of the code or by modifying the code). Therefore, the algorithm can be used to explore how different factors and components in tephra inversion, such as sample site distribution and form of the likelihood function, affect the results. (White et al. 2017) proposed an efficient inversion method which quantifies the posterior uncertainty with linear analysis. The linearity assumption allows the method to operate with efficiency, but the prior and likelihood function in their method have to be Gaussian such that the method functions properly, and so are the resultant posterior distributions. This might be less convenient when certain variables are known to have bounds (e.g., standard deviation of grain size distribution being greater than zero).
Efficiency and the ability to quantify the uncertainty are main motivations driving the development of different tephra inversion techniques (Connor and Connor 2006;Klawonn et al. 2012;White et al. 2017;Mannen et al. 2020), but there is always a tradeoff between these motivations. Therefore, we think that different tephra inversion methods are equally important, and users should decide which method to use based on their specific needs. The presented algorithm represents one end of the spectrum: it only focuses on sampling from the posterior distribution in a statistically formal way without considering efficiency (i.e., no simplifications are made in the algorithm). This is not possible without the low computational cost of Tephra2.
Whether the priors, number of draws, and standard deviations of the proposal functions are specified properly or not affects the performance of the algorithm. This problem arises as long as the M-H algorithm is adopted, and there is no definitively correct way to determine some inputs to the algorithm. Measures to check whether it is used properly are given. The introduction on the algorithm and experiments in Set 1 are presented such that how each element in the algorithm affects its performance is given with demonstration.

Correlation in the posterior distribution
Correlations are detected in the posterior distributions in our synthetic experiments (Tables 4 and 6). Here we focus on posterior correlations in Set 1 experiments, as their setups are simpler. Even in such extremely simplified scenarios, as shown below, properly interpreting the correlation is not straightforward. Correlations in these experiments are shown in Table 4, and the bivariate posterior distributions from selected experiments are shown in Fig. 3.
We find that both negative and positive correlations exist between column height and total eruption mass in the sampled posterior distributions, and whether the correlation is positive or not depends on sample site localities. In Experiments # 10 and 11, the correlation is positive This can be explained by the physics of tephra transport. If observations are made at distal localities (i.e., Experiments # 0-9), the combination of (a) a greater column height, which allows tephra to be dispersed farther downwind and leads to more tephra deposition at distal sites, and (b) a smaller total eruption mass (eruption mass is proportional to tephra thicknss/mass per unit area everywhere) leads to results similar to the combination with a lower column height and a greater total eruption mass (within the area where footprints of tephra deposits overlap). Therefore, the correlation is negative in Experiments # 0-9.
A lower column height leads to a thicker tephra deposit at proximal sites because of less interaction with wind, and less time for turbulent diffusion to disperse tephra to distal localities. Total eruption mass is always proportional to tephra thickness and mass per unit area. The above arguments suggest that scenarios with (1) greater column height and greater total eruption mass and (2) lower column height and lower total eruption mass lead to similar tephra thickness or mass per unit area if the observations are all made at proximal sites.
These relationships are consistent with previous studies (e.g., Suzuki and et al (1983) and ), but how their interaction with sample site locations would affect the correlation between the variables of interest in tephra inversion has not been reported or noted. The correlation in Experiment # 10 is surprisingly high (0.986), which is due to the facts that the sample sites are too close to each other, and the "observed" tephra mass per unit area values at these sites have similar values. To the algorithm, the ten observations in Experiment # 10 are almost the same (both their locations and their observed values). The high correlation reveals that non-unique solutions exist in Experiment # 10.
The presence of correlation in the posterior distribution suggests that the interaction of variables plays a role in tephra inversion. This is consistent with results from sensitivity analysis on Tephra2 in the work of (Scollo et al. 2008). This finding suggests that the algorithm has the potential to be used to discover intrinsic relationships (interactions) between variables of interest and wind conditions in Tephra2 and other dispersion models, and could thus improve our understanding on different sources of uncertainty in tephra inversion.

Whether to estimate wind direction and speed at each elevation
The algorithm can be used with either simplified or nonsimplified wind profiles. It is always preferred to have a more exact and detailed understanding on wind conditions in tephra inversion. Estimating a lot of variables at the same time could be challenging for the algorithm as the number of draws has to be finite. It is common to estimate six or more ESPs (e.g., column height, total eruption mass, column mass distribution, mean and standard deviation of grain size distribution, and diffusion coefficient) in tephra inversion. If wind direction and speed are to be estimated at ten elevations, this adds up to at least 26 = 6 + 10 × 2 variables of interest. Considering just two values for each variable, this means 2 26 (which is greater than 60 million) possible combinations for the 26 variables. The number of field measurements for tephra fall deposits rarely exceeds 300.
This problem could be resolved by the method proposed by White et al. (2017), which is able to estimate a full complement of uncertain ESPs and wind conditions, and provide estimates of posterior variances. This is owing to the use of regularization techniques in their method that can accommodate much higher dimensionality. A solution can be found, but how the solution connects to the wind profile in reality (how well the solution reproduces reality) is another interesting and different topic. The wind profile only affects advection of tephra dispersal in Tephra2. Similar to vector decomposition, the total effect of advection for tephra particles with a certain grain size could always be decomposed into different combinations of advection effects at each elevation level. Multiple or a series of optimum solutions of wind profile are likely to exist. Therefore, we think that it is not always necessary to estimate the wind speed and direction at each elevation level especially given extremely sparse observations. As no presumptions are adopted in the present algorithm, allowing two ways to specify the wind profile in the algorithm could help address questions associated with wind conditions in tephra inversion, such as under what conditions and how details of the wind profile affect the inversion results.

Application to the Kirishima-Shinmoedake dataset
In this section, we apply the M-H algorithm to a dataset containing mass of tephra per unit area for the 2011 Kirishima eruption. The goals are to show that Tephra2 outputs from using posterior means from the algorithm as ESPs and wind conditions would resemble field observations, and that the posterior distributions can be characterized by lower uncertainty compared to their corresponding priors. We refrain from going into details in the physics of tephra transport and how the interaction between variables would affect the estimates. This is because tephra dispersal processes that are not taken into account by Tephra2 did affect tephra dispersal during the eruption (Mannen et al. 2020) . For the same reason, comparing observed and predicted ESPs is not done in the present work (because we know that even if we apply the observed ESPs and wind conditions to Tephra2, the prediction would still be different from observations given the presence of model uncertainty). Discussions on model uncertainty are outside the scope of this work. See (Mannen et al. 2020) for a detailed, careful, and strict treatment of tephra inversoin for this deposit.
The Kirishima-Shinmoedake event took place from 26 to 29 January, 2011, with an eruption of the Shinmoedake volcano. The column height ranged from 6.2-8.6 km above the crater, based on different models and Doppler radar measurements Maeno et al. 2014). The total eruption mass was estimated to be 1.8 − 3.1 × 10 10 kg by Nakada et al. (2013). The mass of tephra erupted from the afternoon of 26 January to the early morning of 27 January, which corresponds to the current dataset, is about 1.4 -2.5 × 10 10 kg ). The wind was blowing to the southeast, and the wind profile is reported in Hashimoto et al. (2012).
The tephra deposit data we are using are reported in White et al. (2017). Detailed description of the dataset can be found in White et al. (2017) and Mannen et al. (2020). Tephra thickness and grain size distributions were measured at 55 locations downwind from the vent. In addition, tephra thickness was measured at another 63 locations. The thickness measurements were converted to mass per unit area (Fig. 4a).
We set the ESPs to be estimated as column height, eruption mass, α (in this experiment, β is fixed to be one), median and standard deviation of grain size distribution, diffusion coefficient, fall time threshold, and densities of lithic and pumice fragments. The eddy constant is fixed as 0.04. For the wind condition, a simplified wind profile is adopted, to avoid overcomplication of the problem, as discussed before. We assume that the wind speed increases linearly from 0 to 11 km a.s.l., and then decreases with elevation to 24 km a.s.l. This setup is based on the wind speed profile reported in (Hashimoto et al. 2012). Wind direction and maximum wind speed are the two variables to be estimated for the wind profile. This amounts to 11 variables of interest and 118 mass per unit area observations for the problem. The priors of these variables are inferred based on Nakada et al. 2013;Miyabuchi et al. 2013;Maeno et al. 2014;White et al. 2017), and are shown in Table 8. These priors are generally consistent with the priors defined in (White et al. 2017) except that eddy constant is set to be a fixed value in our experiment.
We draw fifty thousand samples; this process is repeated three times. Sample distributions from the three runs are almost identical to each other, suggesting that the results converge. The first 5000 samples are discarded to exclude  Table 8 for their values. The corresponding isopachs are also shown in the four subfigures. Levels of these isopachs are 50, 25, 10, 5, 1 kg/m 2 ; c: absolute difference (point size) and difference (point color) between simulated and observed masses per unit area. d: relative error, i.e., (simulation-observation)/observation for each sample. Points that have relative error with magnitude greater than 1 are marked as crosses Table 8 Priors and posterior means and standard deviations from applying the algorithm to the 2011 Kirishima-Shinmoedake eruption tephra mass per unit area dataset. Priors of column height, total eruption mass, and median and standard deviation of grain size distribution are referenced and inferred from Nakada et al. 2013;Miyabuchi et al. 2013;Maeno et al. 2014;White et al. 2017). Priors of α/β ratio, diffusion coefficient, fall time threshold, and pumice and lithic densities are specified as commonly adopted ranges or maximum ranges possible the impact from the initial starting point. For the rest of the chain, we collect samples based on a 15-point interval to avoid autocorrelation. Large relative measurement uncertainty (i.e., 45%) is adopted here. This value is larger than the one (30%) adopted in White et al. (2017); Engwell et al. (2013). This is because from our experiments, we find that for some observations with very low magnitude, with 30% of measurement error, the likelihood always goes to zero. This means that the relative measurement error of these observations has to be greater than 30% (and assuming the absence of model uncertainty). We could either assign a greater measurement error to these observations, while keeping the others having 30% of relative measurement error, or assign a greater measurement error for all observations. Both can be done by the algorithm (but the former requires slight modification of the current code), but here we prefer the latter. This is consistent with the main goal of this experiment: to show that the algorithm is able to reproduce observations of a real tephra fall deposit, and this could avoid justifying how and why we adjust the measurement error for certain observations.
The results are summarized in Table 8. The resultant isopach maps, differences between observations and simulations, and relative errors are presented in Fig. 4. Posterior means of column height and total eruption mass are in general consistent with previous studies, and are updated from the priors with posterior standard deviations being much smaller. The simulated mass per unit area data from Tephra2 with posterior means as ESPs and wind conditions are plotted against field observations in Fig. 5, which suggests that Tephra2 could generally reproduce field observations based on estimated results from the algorithm. Posterior means of column height and total eruption mass are 7.3 km and 9.14 × 10 9 kg, respectively. The former is within the range of the observed column height, and the latter is smaller than estimates from previous work.
Posterior distributions of the other ESPs generally lie within commonly-seen ranges (Table 8), and are also altered from the corresponding priors. We note that the posterior mean of the median of grain size distribution is finer than data reported by Miyabuchi et al. (2013). We think that this could be explained by the fact that data reported by Miyabuchi et al. (2013) represent the grain size distribution at certain sample sites, whereas our estimate focuses on the total grain size distribution.

Conclusions
In this work, we couple the well-known M-H algorithm with the tephra transport and deposition model Tephra2. The coupled algorithm can be used to infer ESPs of explosive volcanic eruptions and ambient wind conditions based on thickness or mass per unit area measurements of tephra fall deposits under a Bayesian framework. It allows users to include their prior knowledge on the eruption or deposit with field observations in a statistically formal way. The result of the algorithm is presented as sample posterior distributions for the variables of interest. We introduce the model Tephra2 and basic elements of Bayes' rule, and present intuitive interpretations on the M-H algorithm. How to implement the coupled algorithm and formats of the input files are also introduced.
The coupled algorithm is validated with three sets of experiments. For the first set, we focus on two variables of interest. In these experiments, we vary values of the input and size of the synthetic observation dataset one at a time to show that the algorithm functions consistently with expectations based on the Bayes' rule, and also to show how inputs affect the performance of the algorithm. In the second set of experiments, we estimate eight variables of interest with poorly-constrained priors (except for column height). The results show that the algorithm is able to effectively estimate the posterior distributions of most of the variables of interest, but the posterior distributions of column height and diffusion coefficient are similar to their priors. For the former, that is because its prior is specified to be well-constrained. Supplementary experiments are done to show that the variable diffusion coefficient can be well-estimated in simpler scenarios. The combination of these experiments suggests the validity of the algorithm, and indicates that posterior distributions of some ESPs are harder to constrain compared to others.
In the experiment in Set 3, we set eight variables of interest to be estimated, including not only two ESPs, but also wind directions and speeds at three elevation levels. This experiment is set up to show that the algorithm works with a complex wind profile.
Advantages of the algorithm are that it has the ability to incorporate prior knowledge into the estimate in a statistically formal way, and to quantify the uncertainty in the estimate, and it captures correlation between variables of interest in the estimate. Because of these advantages, we think that the algorithm has the potential to improve our understanding on how different sources of uncertainty interact and affect the results in tephra inversion. The main limitation of the algorithm is that there are subjective choices in implementation, which affect its performance. How and why they affect the algorithm are introduced, and commonly adopted measures and references (Chib and Greenberg 1995;Andrieu et al. 2003;Kaipio and Somersalo 2006) on how to specify the inputs properly are given.
Correlations between variables of interest exist in our experiments. Interpretation based on the physics of tephra transport is given for the correlations in Set 1 experiments (which are extremely simplified): whether the correlation between column height and total eruption mass is positive or not depends on sample site locations. A greater column height has a positive and negative relationship with tephra mass per unit area at distal and proximal sample sites, respectively, and total eruption mass is always proportional to tephra mass per unit area regardless of sample site location.
The algorithm supports specifying and estimating the wind profile in two ways. The first one takes advantage of a simplified wind profile based on four variables of interest, and assumes that the wind direction does not change with elevation. The second one allows users to estimate wind speed and direction at certain or all specified elevation . We argue that users need to be cautious in choosing how to specify and estimate the wind profile, because the second way could introduce a lot more variables to be estimated, and might be unnecessary. How to choose the appropriate way to specify and estimate the wind profile relies on factors such as prior knowledge of weather conditions and sample site distributions. We think that by experimenting on appropriate synthetic data, this question can be addressed. We apply the algorithm to the 2011 Kirishima-Shinmoedake tephra dataset, and the results are in general consistent with observations from previous work. We hope that the present work benefits future studies that attempt to implement tephra inversion and quantify the associated uncertainty.