Estimating the Trial-by-Trial Learning Curve in Perceptual Learning with Hierarchical Bayesian Modeling

The learning curve serves as a crucial metric for assessing human performance in perceptual learning. It may encompass various component processes, including general learning, between-session forgetting or consolidation, and within-session rapid relearning and adaptation or deterioration. Typically, empirical learning curves are constructed by aggregating tens or hundreds of trials of data in blocks or sessions. Here, we devised three inference procedures for estimating the trial-by-trial learning curve based on the multi-component functional form identified in Zhao et al. (submitted): general learning, between-session forgetting, and within-session rapid relearning and adaptation. These procedures include a Bayesian inference procedure (BIP) estimating the posterior distribution of parameters for each learner independently, and two hierarchical Bayesian models (HBMv and HBMc) computing the joint posterior distribution of parameters and hyperparameters at the population, subject, and test levels. The HBMv and HBMc incorporate variance and covariance hyperparameters, respectively, between and within subjects. We applied these procedures to data from two studies investigating the interaction between feedback and training accuracy in Gabor orientation identification across about 2000 trials spanning six sessions (Liu et al., 2010, 2012) and estimated the trial-by-trial learning curves at both the subject and population levels. The HBMc generated best fits to the data and the smallest half width of 68.2% credible interval of the learning curves compared to the BIP and HBMv. The parametric HBMc with the multi-component functional form provides a general framework for trial-by-trial analysis of the component processes in perceptual learning and for predicting the learning curve in unmeasured time points.

Several research groups have developed parametric methods to estimate the empirical learning curve on a trial-by-trial basis (Kattner, Cochrane, & Green, 2017a;Zhang et al., 2019a;Cochrane, Green & Lu, Submitted).By modeling the thresholds and bias of the psychometric function as specific parametric functions of time or trial number during perceptual learning, researchers were able to construct high-quality trial-by-trial learning curves for each observer during training and transfer, based on the model's best-fitting parameters (Dale et al., 2021;Kattner, Cochrane, & Green, 2017a;Zhang et al., 2019aZhang et al., , 2019b)).
One prerequisite of the parametric methods is a suitable parametric functional form.In this study, we developed and applied inference procedures to estimate the trial-by-trial learning curve in a Gabor orientation identification task based on the multi-component functional form (MCFF) of perceptual learning revealed in Zhao et al (submitted) for this task.Figure 1.The trial-by-trial generative model of the learning curve (black curve) across six sessions, comprising four latent component processes: general learning (yellow), betweensession forgetting (purple), within-session re-learning (olive), and within-session adaptation (orange).The HBMs consist of three levels of the hierarchy: population, subject, and test, in which all subjects belong to a population and may in principle run the same experiment (called "test") multiple times.In the HBMv and HBMc, the distributions of MCFF parameters at the test level are conditioned on the hyperparameter distributions at the subject level, which, in turn, are conditioned on the hyperparameter distribution at the population level.The HBMc includes covariance hyperparameters at the population and subject levels to capture relationships between and within subjects, while the BIP and HBMv do not.
Most perceptual learning experiments use a hierarchical experimental design structure (Kim et al., 2014;Yin et al., 2018), in which the study population is divided into multiple groups with different training protocols, each consisting of multiple subjects.Hierarchical models allow us to effectively combine information across subjects and groups while preserving heterogeneity (Kruschke, 2014;Rouder & Lu, 2005).The HBM typically consists of sub-models and probability distributions at multiple levels of the hierarchy and can compute the joint posterior distributions of the parameters and hyperparameters using Bayes' theorem based on all available data (Kruschke, 2014;Kruschke & Liddell, 2018).Taking advantage of conditional dependencies within and across levels, the HBM often reduces the variance of the estimated posterior distributions by decomposing variabilities from different sources using parameters and hyperparameters (Song et al., 2020) and shrinking estimated parameters at lower levels towards the modes of higher levels when there is insufficient data at the lower level (Kruschke, 2014;Rouder et al., 2003;Rouder & Lu, 2005).It has found applications in many perception and cognitive science studies (Ahn et al., 2011;Lee, 2006;Merkle et al., 2011;Palestro et al., 2018;Prins, 2019;Rouder et al., 2003;Rouder & Lu, 2005;Wilson et al., 2020;Zhao et al., 2023cZhao et al., , 2023bZhao et al., , 2023a;;Zhao, Lesmes, Dorr, et al., 2021;Zhao, Lesmes, Hou, et al., 2021)..
In this paper, we introduce the multi-component functional form (MCFF) along with the contrast threshold psychometric function as the generative model of trial-by-trial performance in perceptual learning.We provide an overview of the Bayesian inference procedure (BIP), which performs an independent estimation of the posterior distribution of MCFF parameters and, therefore trial-by-trial learning curve for each subject.Subsequently, we present HBMv and HBMc models which are designed to collectively estimate the joint posterior distribution of the MCFF hyperparameters and parameters and, therefore the trial-by-trial learning curves for all subjects.We applied these procedures to data from two studies that investigated the interaction between feedback and training accuracy in Gabor orientation identification over approximately 2000 trials across six sessions (Liu et al., 2010(Liu et al., , 2012)).Our analysis involved comparing the goodness of fit and variability of estimated trial-by-trial learning curves obtained from the three methods.Furthermore, we assessed the quality of the predicted learning curves from the HBMc for subjects with no, one session, and two sessions of training data.

METHODS Data
We obtained data from two previously published studies (Liu et al., 2010(Liu et al., , 2012) )  All participants in the published studies had normal or corrected-to-normal vision.They completed the orientation identification task using an accelerated stochastic staircase procedure (Kesten, 1958) with 320 trials in each of the six daily sessions (s = 1, 2,..S; S = 6).Fifty-nine of them received 60-80 trials pre-training using a QUEST procedure (Watson & Pelli, 1983) in the beginning of the first session, while one subject did not receive pre-training.Detailed descriptions of the experimental procedures can be found in the original papers ( Liu et al., 2010( Liu et al., , 2012)).

Apparatus
The original experiments were conducted using MATLAB (MathWorks Corp., Natick, MA, USA) on a Macintosh Power PC G4 computer with a Nanao Technology Flexscan 6600 monitor.Subjects viewed the displays binocularly at 72 cm in a dimly lit room, and they used a chin rest to maintain their head positions.Data analyses were performed on a Dell computer with an Intel Xeon W-2145 @ 3.70GHz CPU (8 cores and 16 threads) and 64GB of installed memory (RAM).The analyses were conducted using MATLAB and JAGS (Plummer, 2003) in R (R Core Team, 2003).

THEORETICAL FRAMEWORK
In this section, we first introduce the MCFF and the likelihood function.We then describe the Bayesian Inference Procedure (BIP), Hierarchical Bayesian Model with population, subject, and test levels with variance hyperparameters (HBMv), and Hierarchical Bayesian Model with population, subject, and test levels with covariance hyperparameters (HBMc).
To begin, each subject  ∈ [1, ] (where I = 60) in each test  completed   trials in the study.We retain the index  for generality, although in this case, we set  =1 because all the subjects only ran the experiment once.The three inference methods were applied to the experimental data in a condition-blind manner, without considering the training accuracy and feedback conditions.Following group-blinded estimation, the data will be unblinded and the resulting characteristics of learning will be analyzed.The HBMc estimates the joint distribution of MCFF parameters and hyperparameters across tests and subjects.It utilizes mean  and covariance  hyperparameters for the population and mean   and covariance  hyperparameters for individual subjects. is assumed to be the same for all subjects.

The MCFF
The log 10 threshold learning curve is constructed by adding the contributions of all latent component processes in the MCFF (Figure 1).We consider four component processes: 1) General learning:  − log 10 () , where  is the trial number, b represents the initial threshold and  is the learning rate; 2) Between-session forgetting: This is depicted as a step function at the beginning of each daily session, characterized by a height  s , which can vary across sessions (s); 3) Within-session rapid relearning: Modeled as an elbow function with a rapid linear learning rate − s and an asymptotic level − s , both of which can vary across sessions (s); 4) Within-session adaptation or deterioration: Represented as a linear increasing function with a rate   , which can vary across sessions (s).
In this study, we tested three versions of the MCFF.In the most saturated version considered in this study based the best model revealed in Zhao et al. (submitted), with K = 14 parameters, both between-session forgetting ( s ) and the asymptotic level of rapid relearning  s were free to vary across sessions.In one reduced model with K = 6 parameters, we constrained  s and  s to be the same across sessions.In the most simplified model with K = 2 parameters, we retained only general relearning and removed all other component processes.For clarity, Table 1 lists the correspondence between   and the original MCFF parameters for the three versions of the MCFF.On each trial   , the log10 contrast threshold    for subject i in test j can be calculated by adding the contributions of all the latent component processes with parameters   = ( 1 ,  2 , … ,   ).The probability of obtaining a correct response, denoted as    = 1, to a stimulus with contrast    in trial   is described using a Weibull psychometric function (Figure 2a): Here, in equation 1a, •  = 0.5 represents the guessing rate.
•  represents the slope of the Weibull psychometric function in a two-alternative forced-choice task (2AFC).
•  1.5 = 0.856 is the probability of making a correct response when  ′ = 1.5 in a 2AFC task.
(2) Equations 1 and 2 define the likelihood function by quantifying the probability of a correct or incorrect response based on stimulus contrast    , the slope of the psychometric function , and parameters of the MCFF   .The overall probability of observing all the responses ( 1:  ) for subject  in test  is determined by the product of individual probabilities (   |  , ,    ) for all trials in that specific test: . (3)

Bayesian Inference Procedure (BIP)
The Bayesian Inference Procedure (BIP) is used to estimate the posterior distribution of where  0, and  0, are the lower and upper bounds of dimension  (Table 2).The denominator of equation ( 4) is an integral across all possible values of   .
In the BIP,  is set to 2 for all subjects based on the results from the HBMv and HBMc to allow more stable estimates   , because it is difficult to estimate based on data from a single subject with limited range of performance levels, but the MCFF parameters are estimated independently for each subject.
Table 2. Lower and upper bounds of priors of MCFF parameters.k -3 -0.5 The Hierarchical Bayesian Model with Variance (HBMv) is a three-level hierarchical Bayesian model used to estimate the joint posterior MCFF hyperparameter and parameter distribution across all subjects, without considering covariance within and between subjects (Figure 2c).This model includes probability distributions at three levels: population level, subject level, and test level.
Population Level: The probability distribution of   , the  ℎ dimension of the MCFF hyperparameter  at the population level, is modeled as a mixture of Gaussian distributions  with mean   and standard deviation   , which in turn have distributions (  ) and (  ): (  ) = (  ,   ,   )(  )(  ). (6a) The probability distribution of , (), is the product of the probability distributions across all dimensions of : The conditional probability distribution of   , (  |  ), is the product of the conditional probability distributions across all dimensions of   : The probability of obtaining all the data is computed using probability multiplication, which involves all levels of the model and the likelihood function based on the trial data: , ( 9) where K = 2, 6, and 14 for the three versions of the MCFF, and  = ( 1:,1: ,  1:, ,   ,   ,   , ) are all the MCFF parameters and hyperparameters in the HBMv.
Bayes' rule is used to compute the joint posterior distribution , which includes all MCFF parameters and hyperparameters (Kruschke, 2014;Lee, 2011): where the denominator is an integral across all possible values of  and is a constant for a given dataset and HBMv;  0 (  ),  0 (  ),  0 (  ), and  0 () are the prior distributions of   , , , and where  0, and  0, are defined in Table 2; Γ(, )is a Gamma distribution with shape parameter  and rate parameter .
The HBMv estimates the joint posterior distributions of MCFF hyperparameters and parameters of all tests and subjects without considering covariance within and between subjects, while sharing a common slope parameter β across all tests and subjects.

Computing the joint posterior distribution
We utilized R (R Core Team, 2003) function run.jags in JAGS (Plummer, 2003) to generate representative samples of   (k =1, 2, …, K) in three Markov Chain Monte Carlo (MCMC) chains for subject i, using the Bayesian Inference Procedure (BIP) through a random walk procedure (Kruschke, 2014).Each chain produced 5,000 kept samples (with a thinning ratio of 10) after a burn-in phase of 5,000 steps and 5,000 adaptation steps.Similarly, we computed 5,000 kept representative samples (with a thinning ratio of 10) of the joint posterior distribution of   (  × 60 parameters),   ( × 60 parameters),   (K parameters),   ( parameters),   (K parameters), and  (1 parameter) in three MCMC chains for HBMv after a burn-in phase of 5,000 steps and 5,000 adaptation steps.Additionally, we calculated 5,000 kept samples (with a thinning ratio of 10) of the joint posterior distribution of   ( × 60 parameters),   ( × 60 parameters), Σ (( ×  + )/2 parameters),  ( parameters),  ((( ×  + )/2 parameters), and  (1 parameter) in three MCMC chains for HBMc after a burn-in phase of 500,000 steps and 5,000 adaptation steps.The number of adaptation steps was determined to ensure convergence, following Gelman and Rubin diagnostic rule (Gelman & Rubin, 1992).A model is considered "converged" when the between-and within-MCMC variance ratios for all parameters are less than 1.05.This ratio is calculated as the variance of samples across MCMC processed divided by the variance of samples within each MCMC process.
We applied the three modeling procedures to both pre-training and training data utilizing the most saturated model ( = 14).For the HBMc, two reduced models (K=2, 6) were also fit.
Additionally, the most saturated HMBc was fit to augmented data, which included full datasets of 59 subjects.Predicted learning curves were generated for a sample subject i = 15 under various scenarios, including no data, one session of data, or two sessions of data; and these predictions were compared with the actual learning curve of this subject.

Statistical analysis
We initially estimated the   for all subjects using the BIP, HBMv, and HBMc, regardless of training accuracy and feedback conditions.In this study, j was set to 1 since each subject underwent testing only once.Subsequently, we computed the estimated trial-by-trial threshold learning curve,   1 , from the estimated  1 for each subject, and unblinded the data to compute group-level statistics of the estimated learning curve for each group .

Goodness of fit
We assessed and compared goodness of fit among the three methods using the Bayesian predictive information criterion (BPIC) (Ando, 2007(Ando, , 2011)), which quantifies the likelihood of the data based on the joint posterior distribution of MCFF parameters while also penalizing for model complexity.

Mean and Standard Deviation of 𝜉 𝑡 𝑖1
The posterior distribution of   1 was constructed by computing the learning curve (Figure 1) from samples of the posterior distribution of  1 .To measure variability or uncertainty at the test level for the BIP, HBMv, and HBMc methods, we utilized the half width of the 68.2% credible interval (HWCI) of the posterior distribution of   1 (Clayton & Hills, 1993;Edwards et al., 1963).

Group-level statistics
The posterior distribution (  ) of the MFCC parameters of the learning curve of group ,   , was constructed by ( 1) averaging each sample  1 from the joint distribution at the test level across all subject i in the group, and ( 2) repeating (1) 15,000 times.One-way MANOVA was conducted on the mean  ̅  of (  ) with R function manova.Linear discriminant analysis (LDA) was conducted on the (  ) with R function lda for post hoc comparisons.In each LDA, 90% and 10% random samples from (  ) were used as training and test data, respectively.The LDA was repeated 100 times.

Goodness of fit
The Bayesian Predictive Information Criterion (BPIC) values for the most saturated model (K=14) were 125144, 125119 and 125047 for the BIP, HBMv, and HBMc, respectively.
These values indicated that the HBMc provided the best fit to the data among the three models.
For the HBMc, when constraining  s and  s to be the same across sessions (K = 6), the BPIC value increased to 125056.Retaining only general learning (K = 2) further increased the BPIC to 126286.As a result, it was concluded that the HBMc with  s and  s varying across sessions (K = 14) produced the best fit to the data.
Comparing BIP, HBMv and HBMc learning curves Figure 3. Average trial-by-trial learning curves across all subjects in groups 1 to 6 (columns 1 to 6), with mean (lines) and standard error (shaded areas) from the three methods: BIP (blue), HBMv (olive), HBMc (orange).Figure 3 illustrates the mean and standard error of the average trial-by-trial learning curves for each of the six groups from the most saturated MCFF generated with the BIP, HBMv, and HBMc.The standard errors in Groups 3 and 4 are larger because they only contained six subjects each, while the other groups had 12 subjects each.Table 3 presents the average 68.2% half width credible interval (HWCI; in log10 threshold units) of the learning curves in the six groups.The HBMc generated the most precise estimates of the learning curves, with the smallest average 68.2%HWCI in all groups.Based on these and the BPIC results, we will focus on the HBMc with K = 14 in subsequent analyses., 5, and 6 illustrate the posterior distributions of MCFF hyperparameters and parameters at the population, subject, and test levels obtained from the HBMc with K = 14 as two-dimensional projections between pairs of six representative dimensions (k = 1, 2, 3, 4, 9, 10) of the MCFF parameters in the HBMc, with only one of the between-session forgetting parameters and one of the asymptotic rapid relearning levels.The HBMc, with its incorporation of covariance hyperparameters, recovered the relationships between MCFF parameters among different subjects and tests.Correlations ranged from -0.47 to 0.64, -0.31 to 0.50, -0.74 to 0.66, at the population, subject, test levels, respectively (Tables 4, 5, and 6).In addition, the mean and 68.2% HWCI of the  posterior distribution were 2.00 and 0.041, respectively. 13 -0.26-0.45 -0.10 -0.01 -0.22 -0.13 -0.47 -0.17

Group-level statistics
In order to conduct group-level analysis, we constructed the joint distributions of the MCFF parameters (  ) at the group level.Figure 7 illustrates (  ) of group 2. Table 7 shows the correlations of   components in the group.Again, the HBMc recovered the relationships between MCFF parameters at the group level.For groups 1 to 6, correlations of   components ranged from -0.44 to 0.52, -0.31 to 0.50, -0.35 to 0.46, -0.37 to 0.55, -0.38 to 0.55, and -0.33 to 0.50, respectively.We also computed the mean and 68.2% HWCI of the marginal distributions of the general learning rate in each group.Although there was significant general learning in all groups, Group 1 exhibited a significantly lower learning rate (-0.026 ± 0.006) compared to the other five groups (-0.041 ± 0.005, -0.043 ± 0.010, -0.064 ± 0.009, -0.046 ± 0.006, and -0.044 ± 0.007).In this figure, we colored the segments of the estimated learning curves from observed data in orange and those from predicted curves in yellow.The correlations between the predicted and observed learning curves (with all six sessions of data) were 0.841, 0.944, and 0.958 when there was no data, one session, and two sessions, respectively.The average 68.2%HWCI of the predicted learning curves was 0.278, 0.077, and 0.055 log10 units for scenarios with no data, one session, and two sessions, corresponding to 1252%, 331%, and 238% increase of 68.2% HWCI compared to the actual observed learning curve with all six sessions of data.Both the accuracy and the reliability of the predicted learning curves increased with the amount of available data.

Predicting the learning curve
Remarkably, the predicted learning curves from one and two sessions of data closely matched that of the observed learning curve for this subject.

DISCUSSION
We introduced the multi-component functional form (MCFF) from the non-parametric HBMc analysis (Zhao et al. submitted) along with the contrast threshold psychometric function as the generative model of trial-by-trial performance in perceptual learning.We developed three parametric inference procedures to estimate the parameters of the MCFF and, therefore, the trialby-trial learning curve from two datasets that investigated the interaction between feedback and training accuracy in Gabor orientation identification.
The HBMc incorporated covariance hyperparameters at the population and subject levels, capturing the relationship between and within subjects, resulting in the best fits to the trial-bytrial data and precise estimates of the learning curves.Among the HBMc solutions with different numbers of MCFF components, the HBMc with the most saturated version (K=14) generated the best fit to the trial-by-trial data, aligning with findings in our non-parametric HBMc analysis (Zhao et al. submitted) and extending them to individual subject levels.
MANOVA and LDA analyses of the joint distributions at the group level found that the low-training accuracy without feedback group (Group 1) was significantly different from the other groups that received training either at a high accuracy and/or with feedback (Groups 2 to 6).We also found significant general learning in all groups, although Group 1 exhibited a significantly lower learning rate compared to the other five groups.These findings align with the results from the original studies (Liu et al., 2010(Liu et al., , 2012) ) as well as the non-parametric HBMc analysis (Zhao et al. submitted), although both the trial-by-trial analysis with parametric HBMc and the 20-trial-block analysis with non-parametric HBMc are shown to be more sensitive to detect significant learning in Group 1, which the traditional adaptive staircase method failed to detect.
The HBMc provided population-level and individual-level posterior distributions of model parameters.This feature facilitated statistical inferences and predictions of the trial-bytrial learning curve.Even with minimal data, the predictions demonstrated a notable correlation with observed data, highlighting the HBMc's potential to enhance test efficiency.
Accurate predictions of the learning curve could hold significant implications, especially in fields requiring perceptual expertise.This includes occupations in which selecting individuals with the aptitude for efficient perceptual learning is crucial.Moreover, it opens avenues for optimizing training strategies and protocols, particularly in clinical applications, rehabilitation settings, and domains demanding perceptual expertise.
Training for perceptual expertise is often resource-intensive, both in terms of cost and time (Dosher & Lu, 2020).Therefore, the ability to selectively identify individuals with a higher likelihood of acquiring the necessary perceptual expertise efficiently has the potential to increase the overall success rate of training programs.
Although our primary focus has been on the estimation of the trial-by-trial learning curve, the methods developed here can also be used to study specificity and transfer in perceptual learning.Studies on specificity or transfer in perceptual learning have mostly relied on the estimates of initial transfer, usually quantified as performance during the first assessment after a task switch.Some studies, though fewer in number, have examined learning rates following a task switch and have reported instances of accelerated learning or 'learning to learn' (Bejjanki et al., 2014;Kattner, Cochrane, Cox, et al., 2017;Z. L. Liu & Weinshall, 2000;R.-Y. Zhang et al., 2021).However, the granularity of analysis in these studies has often been coarse, potentially masking multiple underlying component processes that may be affected by prior learning.The parametric methods developed in this study could be used to recover trial-by-trial learning curves and the time course of transfer for each subject.
While some researchers have advocated for modeling perceptual learning as a continuous function of time-on-task, applying parametric functional forms such as exponential or power curves (Cochrane, et al, submitted), our previous study (Zhao, et al., submitted) revealed the intricacies of the learning process.This process extends beyond general learning, encompassing phenomena like between-session forgetting and within-session rapid relearning and adaptation.
In our prior work, we exclusively applied the non-parametric methods to analyze the contrast threshold learning curve in an orientation identification task.Moving forward, our strategy involves applying the non-parametric approach to diverse perceptual learning tasks.This broader application aims to identity the specific component processes at play in different tasks.
Subsequently, we plan to leverage the insights gained to apply the parametric methods presented in this study, enabling the estimation of trial-by-trial learning curves across various perceptual learning domains.Such finer-grained learning curves may be more sensitive to dynamic training and transfer effects, revealing a more nuanced and complete picture of perceptual learning.
Additionally, the Multi-Component Functional Form (MCFF), besides enabling the estimation of trial-by-trial learning curve, may be valuable for enhancing adaptive assessment in perceptual learning testing procedures and monitoring the time course of perceptual sensitivity change.In our previous work, we introduced the adaptive qCD method to assess perceptual sensitivity changes (e.g., dark adaptation, perceptual learning) with an exponential functional form (Zhao et al., 2019).The qCD showcased its efficacy in estimating trial-by-trial learning curves with high accuracy and precision compared to traditional staircase procedures (Zhang et al., 2019b).With the refined functional form provided by the MCFF, to the qCD method can be further enhanced to achieve a more accurate estimation of the time course of learning.In summary, the MCFF implemented in the parametric HBMc serves as a powerful tool for estimating trial-by-trial learning curves, revealing detailed time courses of processes in perceptual learning.Its versatility provides an effective framework for generating accurate and precise estimates as well as predictions of dynamics changes in human performance across experiments with hierarchical designs.

)
Zhao et al. (submitted)  developed and applied three non-parametric inference procedures to the data from two studies that investigated the interaction between feedback and training accuracy in Gabor orientation identification over approximately 2000 trials across six sessions and estimated the learning curve with block sizes of 20, 40, 80, 160, and 320 trials.Analysis at the scale of 20 trials per block identified significant contributions from general learning, between-session forgetting, and rapid relearning and adaptation within sessions, resulting in an MCFF with four latent component processes: general learning, between-session forgetting, within-session re-learning, and within-session adaptation.An underlying learning curve (black curve) from the MCFF across six sessions is depicted in Figure1, with the four component processes depicted in yellow, purple, olive, and orange curves.In the present study, we developed three parametric inference procedures to fit the MCFF to the trial-by-trial perceptual learning data inLiu et al. (2010Liu et al. ( , 2012): 1) Bayesian Inference Procedure (BIP): This method estimates the posterior distribution of the parameters of the MCFF for each subject independently.2) Hierarchical Bayesian Model with Population, Subject, and Test Levels, with Variance but no Covariance Hyperparameters at the Population level (HBMv): This model estimates the joint posterior MCFF hyperparameter and parameter distribution across all subjects, without considering the covariance within and between subjects.3) Hierarchical Bayesian Model with Population, Subject, and Test Levels, Incorporating Covariance Hyperparameters at the Population Level (HBMc): This model estimates the joint posterior MCFF hyperparameter and parameter distribution across all subjects, considering the covariance within and between subjects.

Figure 2 .
Figure 2. (a) Psychometric function: A family of psychometric functions parameterized with log10 contrast thresholds    and slope .They serve as the likelihood functions in the inference procedures.(b) Bayesian inference procedure (BIP): The BIP is used to compute the posterior MCFF parameter distributions for each subject independently.  represents MCFF parameters for subject i in test j.(c) Hierarchical Bayesian Model with variance hyperparameters (HBMv): The model calculates the joint distribution of MCFF parameters and hyperparameters across tests and subjects.It incorporates mean   and standard deviation   hyperparameters for the population and mean   and standard deviation   hyperparameters for individual subjects.Notably,   are assumed to be the same for all subjects, where k is a single dimension of the

Population Level:
The probability distribution of the MCFF hyperparameter , which consists of all MCFF component parameters at the population level, is modeled as a mixture of K-dimensional Gaussian distributions with mean μ and covariance , which have distributions of () and (): () = (, , )()().(12)Subject Level: The probability distribution of the contrast threshold hyperparameter   of subject  at the subject level is modeled as a mixture of K-dimensional Gaussian distributions with mean   and covariance , with distributions (  |) and ():(  |) = (  ,   , ) (  |)(),(13)in which   is conditioned on .Test Level: (  |  ), the probability distribution of the parameters   is conditioned on   .The probability of obtaining the entire dataset is computed using probability multiplication, which involves all levels of the model and the likelihood function based on the trial data: ( 1:,1: |) = ∏ ∏ ∏  ( 1:  |  , ,  1:  ) (  |  ) 1:  |  , ,  1:  ) (  |  )     =1 (  ,   , ) (  |)()(, , )()()() Bayes' rule is used to compute the joint posterior distribution of , which includes all MCFF parameters and hyperparameters.This computation involves integrating over all possible values of :  ,  ,) (  |) 0 ()(,,) 0 () 0 () 0  ,  ,) (  |) 0 ()(,,) 0 () 0 () 0 the covariance matrix of the estimated MCFF parameters,  HBMv , across all subjects from the HBMv procedure;  HBMv −1 was based on the average covariance matrix  HBMv computed from the estimated MCFF parameters across all subjects from the HBMv procedure.Ω and  are  ×  precision matrices with Wishart distributions , with expected mean Σ  + 1 degrees of freedom.The HBMc estimates the joint posterior distribution of MCFF parameters and hyperparameters as well as  across all tests and subjects.Unlike the HBMv, the HBMc generates a joint posterior distribution in which MCFF component parameter estimates mutually constrain each other across tests and subjects.This allows for more robust and interconnected estimates of MCFF parameters and hyperparameters.

Figure 4 .
Figure 4. Posterior distributions of hyperparameters  at the population level.

Figure 8 .
Figure 8. Mean (lines) and 68.2% HWCI (shaded areas) of the observed (orange) and predicted (yellow) trial-by-trial learning curves from the HBMc with (a) no, (b) one session, and (c) two sessions of data for subject i = 15.(d) The observed learning curve of subject i = 15 with all six sessions of data.

Figure 8
Figure 8 illustrates the predicted learning curves for subject i = 15 with no data, one Although we have developed the HBMc with MCFF within the context of perceptual learning, the parametric framework can be harnessed to quantify learning curves in different learning domains.It can also be employed to improve estimates of human performance parameters, such as  ′ , response time, and threshold, across multiple time points in learning or longitudinal studies, or across diverse experimental conditions, such as different spatial frequencies in contrast sensitivity function (CSF) tests or varying temporal frequencies in assessments of temporal modulation functions.

Table 1 :
Correspondence between   and original MCFF parameters.  )  10 ,  11 ,  12 ,  13 ,  14  6 (  , |  ) is the posterior distribution of   and , which represents MCFF parameters and the slope of the psychometric functions, given the trial-by-trial data   ,  (   |  , ,    ) is the likelihood term, which quantifies the probability of observing the responses    given   and  0 (  ) = ( 0, ,  0, ), from the trial-by-trial data   = {( 1:  ,  1:  )} of subject  in test  via Bayes' rule (Figure    ,  0 (  , ) is the prior probability distribution of   and .In this application, the prior of   is set as uniform distributions for each dimension : ,   ,   )(  )(  ) The probability distribution of   , the  ℎ dimension of hyperparameter   of subject  at the subject level is modeled as a mixture of Gaussian distributions with mean   and standard deviation   , with distributions (  |  ) and (  , ): (  |  ) = (  ,   ,   )(  |  )(  ), (7a) in which   is conditioned on   .The conditional probability distribution of   , (  |), is the product of the conditional probability distributions across all dimensions of   : (  |) = ∏ (  ,   ,   )(  |  )(  ) .(7b)Test Level: The probability distribution of parameters   is conditioned on   , (  |  ).

Table 3 .
The average 68.2% half width credible interval (HWCI; in log10 threshold units) of the learning curves in the six groups.

Table 4 .
Correlations of  components.

Table 5 .
Correlations of   components, averaged across all subjects.

Table 8 .
Average accuracy (% correct) and standard deviation of linear discriminant analyses.