How good is good? Probabilistic benchmarks and nanofinance+

Benchmarks are standards that allow to identify opportunities for improvement among comparable units. This study suggests a 2-step methodology for calculating probabilistic benchmarks in noisy data sets: (i) double-hyperbolic undersampling filters the noise of key performance indicators (KPIs), and (ii) a relevance vector machine estimates probabilistic benchmarks with denoised KPIs. The usefulness of the methods is illustrated with an application to a database of nano-finance+. The results indicate that-in the case of nano-finance groups-a higher discrimination power is obtained with variables that capture the macro-economic environment of the country where a group operates. Also, the estimates show that groups operating in rural regions have different probabilistic benchmarks, compared to groups in urban and peri-urban areas.


Introduction
Benchmarking is the process of analyzing key performance indicators with the aim of creating standards for comparing competing units (Bogetoft and Otto, 2010). Probabilistic benchmarks measure the probability of a unit falling into an interval along with the cumulative probability of exceeding a predetermined threshold (Wolfe et al., 2019). As a management tool, benchmarks allow to identify and apply better documented practices (Bogetoft, 2013).
Benchmarks are widely used in diverse scientific disciplines. Pharmaceutics compare the prices of prescription drugs with benchmarks (Gencarelli, 2005). In environmental science, benchmarks set water quality standards (Dam et al., 2019) or define thresholds for radiation risk (Bates et al., 2011). In finance, interest-rate benchmarks mitigate search frictions by lowering informational asymmetries in the markets (Duffie, Dworczak, and Zhu, 2017).
This study develops a 2-step processes for calculating probabilistic benchmarks in noisy datasets.
In step 1, double-hyperbolic undersampling filters the noise of key performance indicators (KPIs); in step 2, a relevance vector machine estimate probabilistic benchmarks with filtered KPIs.
Archimidean copulas approximate the joint density of KPIs during the denoising step. Besides estimating probabilistic benchmarks, the methods of step 2 identify the continuous and categorical factors influencing benchmarks.
The 2-step methodology is illustrated with an application to a database of nanofinance+ working with business interventions. In nanofinance, low-income individuals without access to formal financial services get together and start to accumulate their savings into a fund, which they later use to provide themselves with loans and insurance. In nanofinance+ (NF+), development agencies, donors and governments help communities to create NF+ groups for financial inclusion and then the groups become a platform for additional 'plus' sustainable development programs-see Gonzales Martínez (2019) for details.
The methods proposed in this study complement the state-of-the-art in probabilistic benchmarking of Chakarov and Sankaranarayanan (2014), Chiribella and Adesso (2014) or Yang, Chiribella, and Adesso (2014). Along with this methodological contribution, the empirical findings of this document fill the research gap left by economic studies that have been focused only on calculating benchmarks for microfinance institutions-see for example Tucker (2001) or Reille, Sananikone, and Helms (2002). In microfinance, benchmarks are used to compare institutions; in nanofinance, benchmarks are aimed to compare groups. Benchmarks for nanofinance groups allow to set performance standards for monitoring and evaluating intervention programs implemented in communities worldwide.
The definition of multivariate probabilistic benchmarks used in the study is described in Section 2. Section 3 discusses the methods for estimating multivariate probabilistic benchmarks in noisy datasets. Section 4 shows the empirical application to the NF+ database. Section 5 concludes.
The data and the MatLab codes that allow to replicate the results of the study are freely available at MathWorks file-exchange (https://nl.mathworks.com/matlabcentral/fileexchange/74398-doublehyperbolic-undersampling-probabilistic-benchmarks).

Multivariate probabilistic benchmarks
Classical benchmarking makes use of fixed inputs to calculate point estimates for classification standards. Probabilistic benchmarking, in contrast, takes into account elements of uncertainty in the inputs and thus generates interval estimates as an output (Liedtke et al., 1998). For example, probabilistic benchmarks are calculated for quantum information protocols-teleportation and approximate cloning-in Yang, Chiribella, and Adesso (2014); more recently, Lipsky et al. (2019) calculate probabilistic benchmarks for noisy anthropometric measures, and Wolfe et al. (2019) use probabilistic benchmarks to quantify the uncertainty in fibromyalgia diagnosis.
Proposition 1 below shows the definition of multivariate probabilistic benchmarks used in this study.
In proposition 1, the discrimination of η 1 , η 2 , ..., η N units in a comparable set H is based on interval estimates of a multi-dimensional threshold (the benchmark) τ . Proposition 1 sets a probabilistic standard based on the joint multivariate distribution function of the KPIs {y 1 , y 2 , ..., y j } y used for calculating τ . The isolines-the contour intervals-defined by the benchmarks τ allow to identify the units h τ with a different performance in the unit hypercube (h τ ⊂ H).
Proposition 2 below states that the thresholds τ can be calculated without the need to know the exact form of the joint density f y (y) in Equation 1: tivariate cumulative distribution function with u ∈ {u 1 , u 2 , ..., u d } uniform marginal distributions and a dependence structure defined by Θ. If u ≡ F y (y), the joint density of y needed to calculate τ can be approximated with the simulation of C Θ (u) in the unit hypercube: Proposition 2 is based on Sklar's theorem (Sklar, 1959;Sklar, 1996), which indicates that any multivariate joint distribution can be written in terms of univariate marginal distribution functions and a copulae C Θ that captures the co-dependence between the variables (Durante, Fernandez-Sanchez, and Sempi, 2013).
Archimidean copulas are a type of copulae that approximate the joint multivariate distribution of KPIs that are not elliptically distributed (Naifar, 2011). In an Archimedean copula C g , an additive generation function g(u) models the strength of dependence in arbitrarily high dimensions with only one scalar parameter: θ (Smith, 2003). Formally: with g(u) a generator function that satisfies g(1) = 0, g (u) < 0 and g (u) > 0 for all 0 ≤ u ≤ 1; hence C θ ≡ C g . In Clayton's Archimedean copula, for example, the generator function is equal to g θ (u) = u −θ − 1 for θ > 1 ( McNeil and Neslehova, 2009;Cherubini et al., 2011): Based on Propositions 1 and 2 above, a 2-step processes is suggested to calculate multivariate probabilistic benchmarks in noisy data sets: 1. In the first step, a swarm algorithm estimates the vector of parameters of a double-hyperbolic noise filter. The optimal estimates of the vector maximize the dependence structure θ in an Archimidean copula calculated with noisy KPIs. The optimal double-hyperbolic filter that maximizes θ is used to denoise the KPIs.
2. In the second step, a relevance vector machine is applied to the denoised KPIs in order to calculate multivariate probabilistic benchmarks. Besides estimating isolines of benchmarks, the relevance vector machine allows to identify factors that influence the benchmarks.

Step 1: Double-hyperbolic undersampling and swarm optimization
Let f h (ψ, y) be the real R part-the imaginary part is discarded-of a translated generalized hyperbola of the form (Hamilton and Knop, 1998): If f ⊥ h ψ ⊥ , y is an orthogonal/quasi-orthogonal rotation of the translated generalized hyperbola defined by equation 4-with rotation parameters ψ ⊥ 1 , ψ ⊥ 2 , ψ ⊥ 3 , ψ ⊥ 4 ψ ⊥ -then the region of the double hyperbola defined by the lobes of f h (ψ, y) and f ⊥ h ψ ⊥ , y can be used to filter the noise of the joint distribution of y, if elements of y outside the lobes of f h (ψ, y) and inside the lobes of the rotated hyperbola f ⊥ h ψ ⊥ , y are discarded.
Let y h ⊂ y be a vector with the non-discarded elements of y inside the lobes of f h (ψ, y) and outside the lobes of f ⊥ h ψ ⊥ , y . The vector y h is an optimal noise reduction of the original data y if the values of ψ and ψ ⊥ maximize the dependence structure (θ) of an Archimidean copula estimated with samples of y, Box 1 below shows a swarm algorithm proposed to estimate the optimal values of ψ and ψ ⊥ that maximize θ. The algorithm maximizes the co-dependence in the Archimidean copula by taking samples of the KPIs contained in y. The structure of the swarm algorithm-separation, alignment, cohesion-is inspired by the BOIDS algorithm of artificial life described in Reynolds (1987).
Box 1. Pseudo-code of the swarm algorithm Data: {y 1 , y 2 , ..., y j } y In the swarm algorithm, δ, M, θ 0 , p 0 , p ⊥ 0 , ζ, ζ * are initialization parameters. The parameter δ ∈ R + controls the initial dispersion of the particles, M is the initial number of particles used to explore possible values of θ; θ 0 = 0 is the starting value of θ m ; p 0 , p ⊥ 0 are the starting values of p m , p ⊥ m ; ζ, ζ * are parameters that control the degree of exploration in the swarm algorithm. Exploitation (δ) and exploration parameters (ζ, ζ * ) are typical of metaheuristic algorithms in general and swarm intelligence in particular-see for example Tilahun (2019).
The algorithm described in Box 1 explores optimal values of the hyperbola parameters p m , p ⊥ m during m-iterations, based on two behavioral rules: cohesion and separation. Swarm cohesion depends on the euclidean norm between p m , p ⊥ m and the optimal values p * m , p ⊥ * m calculated with θ * . Swarm separation is a function of the norm between p m , p ⊥ m and the centroids p * m , p ⊥ * m . Cohesion abstains the swarm m from including extreme outliers-and thus avoids a biased estimation of θ-and separation guarantees that the swarm properly explores all the potential values that can maximize θ for an optimal noise filtering. Alignment is achieved by gradually reducing exploration and exploitation with ζ * (0 < ζ > ζ * ≤ 1).

Step 2: Relevance vector machines
Traditional methods of supervised learning-as stochastic vector machines-produce point estimates of benchmarks as an output. Relevance vector machines, in contrast, estimate the conditional distribution of multivariate benchmarks in a fully probabilistic framework. Compared to stochastic vector machines, relevance vector machines capture uncertainty and make use of a small number of kernel functions to produce posterior probabilities of membership classification.
be a k-set of covariates influencing the KPIs contained in y. The importance of each covariate is defined by a weight vector w = (w 0 , . . . , w k ). In a linear approach, y = w x. In the presence of a non-linear relationship between y and x, a nonlinear maping x → φ(x) is a basis function for y = w φ(x).
Given an additive noise k , the benchmark targets t will be, where k are independent samples from a mean-zero Gaussian noise process with variance σ 2 . Tipping (2000) and Tipping (2001) offer a spare Bayesian learning approach to estimate w in Equation 6 based on the likelihood of the complete data set, for a kernel function K(·, ·). In a zero-mean Gaussian prior for w, α is a vector of k + 1 hyperparameters, and the posterior distribution over the weights is: The assignment of an individual hyperparameter α k to each weight w k allows to achieve sparsity in the relevance vector machine. As the posterior distribution of many of the weights is peaked around zero, non-zero weights are associated only with 'relevant' vectors, i.e. with the most relevant influencing factors of the probabilistic benchmarks estimated with the denoised KPIs.

Empirical application: probabilistic benchmarks in nanofinance+
This section illustrates the methods described in Section 3 with an application to a database of 7830 nanofinance+ groups receiving entepreneurship and business training in 14 African countries: Benin, Burkina Faso, Ethiopia, Ghana, Malawi, Mozambique, Niger, Sierra Leone, South Africa, Sri Lanka, Tanzania, Togo, Uganda and Zambia. Almost all of the groups in the database work with a development agency (94%), and 43% of the groups are located in rural regions. Table 1 shows descriptive statistics of group-level characteristics and the macro-economic environment of the countries where the groups operate. On average, each member of NF+ contributes around 29 USD of savings to the common fund and receives on average a loan of 22 USD. Despite the low values of savings and loans, returns on savings in the groups are on average 47%, whereas the equity per member is on average equal to 40 USD (Table 1).
Successful units-NF+ groups with a higher financial performance-will be those with KPIs delimitied by the isolines of the threshold τ , for a probabilistic benchmark τ ∈ {τ 1 τ 2 }, τ ∈ R 2 .
Following Proposition 2, the joint density of the KPIs (equation 7) is approximated with a bivariate Archimedean copula: Clayton's Archimedean copula is particularly suitable to model the dynamics of nanofinance+.
Clayton's copula has greater dependence in the lower tail compared to the upper tail. In the case of NF+, greater lower tail dependence is expected because groups with low equity will have zero or negative returns, while in contrast there is more dispersion in the indicators of groups with higher performance-i.e. some groups show higher equity but low levels of returns due to lower repayment rates, while groups with low equity may have higher returns due to the higher interest rates charged for their loans.
A bivariate Clayton's Archimedean copula for the uniform marginal distributions of returns on savings (u 1 ) and equity per member (u 2 ) will be: with a probability density function, and a co-dependence parameter θ ∈ [0, +∞), The parameter θ controls the amount of dependence in C θ (u 1 , u 2 ). When θ → +∞ the dependency between u 1 and u 2 approaches comonoticity, while in turn when θ → 0, u 1 and u 2 become independent: In the case of returns on savings and equity per member, it is expected that θ → +∞, as both financial indicators should show lower tail co-dependence in NF+.  Figure 2 shows the optimal denoising of the KPIs of NF+ with double-hyperbolic undersampling.
The first step discards the values of ROS and EPM outside the lobes of the hyperbole estimated with ψ and inside the lobes of the hyperbole estimated with ψ ⊥ (Figures 2b and 2d). The co-dependence between the KPIs before denoising is contaminated with a high number of outliers ( Figure 2e). After denoising, the co-dependence in the lower and upper tails of the KPIs is kept but noisy elements are discarded (Figure 2f). Table 2 and Figure 3 show the results of estimating the relevance vector machine with the denoised KPIs (step 2). In terms of continuous factors influencing the benchmarks, the main covariates affecting the financial benchmarks of NF+ are those related to the macroeconomic environment, mainly GDP growth, poverty, inequality and the percentage of rural population in the country where a NF+ group operates (Table 2). Savings accumulation and loan provision are the main group-level characteristics influencing the financial benchmarks of NF+; this result is expected-because in NF+ the lending channel is the main source of profit generation-and shows the ability of the relevance vector machine to properly detect variables related to financial benchmarks in denoised datasets.
In relation to categorical factors influencing the benchmarks, Figure 3 shows that the probabilistic benchmarks of NF+ are different in rural groups (Figure 3 left) compared to urban groups ( Figure   3 right). While both rural and urban groups have a concentration of financial performance in the lower tail of the joint distribution of the KPIs, higher dispersion in the upper tail is observed in rural groups, and hence the isolines of the probabilistic benchmarks are wider for rural groups compared to urban groups.
In the case of urban and peri-urban nano-finance, groups can be classified as successful with a probability higher than 90% (red contour isoline in Figure 3b) when the groups have returns higher than 55% and equity higher than 80 USD per member (Figures 3f). In rural NF+, however, groups that do not show negative returns and have an equity per member higher than 10 USD are classified as successful with a probability higher than 80% (Figures 3c and 3e).    Figure 2: Denoising with double-hyperbolic undersampling. For an optimal filtering of noise, the points outside the lobes of the first hyperbole are discarded in graph (b), and the points inside the lobes of the second hyperbole are discarded in graph (d). Figure (e) shows the relationship between the KPIs before denoising, and figure (f) shows the relation after denoising.

Conclusion
This study suggested a 2-step approach for calculating probabilistic benchmarks with noisy KPIs.
An empirical application to a noisy database of nanofinance+ shows that the methods are able to denoise KPIs, estimate probabilistic benchmarks, and properly identify the continuous and discrete factors influencing the benchmarks.
In the case of NF+ groups with business training, the results indicate that macroeconomic factors and the region where a group is located influence their financial benchmarks. Governments, international donors and development agencies can use the estimated benchmarks for monitoring the performance of NF+ and gain an independent perspective about how well a group/project is performing when compared to other similar groups/projects. In the presence of performance gaps, the benchmarks will be useful to identify opportunities for change and improvement among the groups 2 .
Future studies can extend the denoising methods to the quadratic surface defined by hyperbolic cylinders. The higher-dimensional hierarchical Archimedean copula proposed by Savu and Trede (2010) can be applied to approximate the multivariate probability distribution of KPIs denoised with hyperbolic cylinders. The recent developments in orthogonal machine learning-see inter alia Oprescu, Syrgkanis, and Wu (2018), Knaus (2018), Semenova (2018) or Kreif and DiazOrdaz (2019)-can be used to estimate quasi-causal factors influencing the benchmarsk, complementing the non-parametric correlational approach of relevance vector machines.