A randomized response technique for reducing the effect of initial non-response of the regression estimator in panel surveys

The results of Sample surveys play a vital role in decision making. One of the main issues being faced by survey statisticians during the collection of survey data is the problem of non-response which may affect survey cost and accuracy of estimates. The problem of non-response becomes more severe if the survey contains sensitive questions like related to family planning methods, use of drugs. To diminish the non-response rate arising in the case of direct questioning (DQ) technique, Warner (1965) proposed an indirect survey technique known as the randomized response (RR) technique. He addressed this problem for a cross-sectional data. This method is a well-known procedure that produces more valid responses on sensitive questions in surveys. The method avoids the direct link between respondent’s response and the sensitive question through the help of a randomization device. Thereby protecting respondent’s privacy which in turn greatly increases survey response rate. However, due to the complex nature of panel estimator, the work is missing the in the context of RR technique. To cover this gap, we propose a linear regression model in the context of panel surveys/longitudinal studies under the application of the RR technique. We solve all these issues through simulation study.


Introduction
Whenever we are talking about surveys or survey sampling, the first word which ultimately comes to our minds is the word "non-response". Non-response not only condenses case numbers of the data but may lead to substantial bias in the estimates (descriptive and causal parameters). This holds for both cross-sectional surveys as well as panel surveys. However, in panel surveys, the case numbers or sample size is further reduced by panel attrition which happens after the initial wave the so-called successive drop-out of respondents in later panel waves. The cumulative dropping of initial non-response and attrition are increasing with the length of the panel. In general, if attrition is selective for the response then it may worsen the size of initial non-response bias in later panel waves. On the contrary, if the attrition is non-selective for response or if the dropout rate is low compared to initial non-response, then under this aspect an initial nonresponse bias may reduce in later panel waves. It was the interesting findings of Sisto (2003) and Rendtel (2013) that in panel surveys sampled from registers the estimates obtained from the initially intended sample persons and the respondent sample (only respondents) converged to the true solution in subsequent panel waves. They did it for income quintiles and poverty states by selecting a sub-sample from the European Community Household Panel (ECHP) and poverty rates from the European Union Statistics of Income and Living Conditions (EU-SILC). Rendtel (2013) used the term "fade-away" for the initial non-response. The idea of the fade-away of bias goes back to the earlier study of Fitzgerald et al. (1998) who find out that without any correction for the initial response or for attrition the distortion effect between the estimates from the Panel Study of income dynamics (PSID) and counts from the current US population survey diminished in subsequent panel waves.
Motivated from the empirical studies of Sisto (2003) and Rendtel (2013) where information about the non-respondents is taken from registers Alho et al. (2017) extends the fade-away phe-nomenon in the context of the Markov chains model. In their influential article, the main tool is the contraction theorem which states that under some regularity conditions the distributional differences between two chains fade-away over time. Alho and Rendtel (2020) conducted a similar study in the context of regression analysis. To explain the fade-away of initial non-response bias they proposed a linear panel model of general length (t = time = wave) in the framework of regression analysis having a single covariate. In their proposed model the covariate and the error term are decomposed into two parts: the permanent parts and the time-varying parts. They concluded that a large initial bias of the OLS estimates due to non-response fades-away in the case of low dynamic components of the covariate and error terms. They proposed an approximation formula for the bias of the OLS estimates of the true slope coefficient under the not missing at random (NMAR) mechanism at the start of a panel survey. However, the approximations formula becomes very intractable in later panel waves of a panel survey.
To avoid such difficulty arising in the case of approximation formula, Khan (2020) extend the results to a longer panel wave of a panel survey using a simulation study. Using the linear model of Alho and Rendtel (2020) for the response, Khan (2020) also concluded that if the size of the time-invariant and dynamic components of the covariate and error terms is small then large initial bias of the OLS estimates fades-away in later panel waves. In the application part, Khan used income and satisfaction data from SOEP which are also completely in line with the theoretical results of Alho. Surveys where interviews are conducted directly are relatively easy than the indirect survey techniques in terms of cheaper survey cost and faster way of obtaining information. However, it may cause a substantial non-response rate, which distorts the distribution of interest, e.g., from normal to non-normal, and also have the potential to bias statistical analysis. One of the main sources of substantial non-response in direct surveys is the sensitivity of survey questions. Where respondents are reluctant to answer certain survey questions due to privacy concerns. Questions such as abortion, the use of the drug, tax evasion, black money, divorce, and plagiarism, or any other sensitive issues, etc., are considered sensitive. If a survey questionnaire contains such type of sensitive questions then applying direct survey methods will cause substantial bias in estimates due to non-response.
To cover this gap in the case of cross-sectional data, Warner (1965) proposed an alternative indirect survey technique known as the randomized response (RR) technique. This method reduces the extent of large non-response arsing in DQ technique by protecting the respondent's privacy when answering a certain sensitive question. This method, therefore, allows interviewers to obtain more accurate information on sensitive issues while guaranteeing the privacy of respondents. In this method the respondents use a randomization device, such as selecting a card, spinning a spinner, or tossing a coin whose result is hidden from the interviewer. Thus by introducing random noise in their responses through a randomization device, the RR method conceals respondent responses and protects their privacy. As privacy is of great concern for survey respondents, in direct response surveys their privacy is nowhere protected resulting in a large part of missing data due to non-response. It is also possible that they may provide false answers which may either under/over estimates true parameters. On the other hand, when they are certain that their privacy will be protected through randomization, they are intended to answer more truthfully. This also includes non-respondents who refuse or reluctant to answer a certain sensitive question in the DQ technique thus increasing survey response rate. As the main purpose of RR technique is to increase the survey response rate, once it is increased through randomization it yields more reliable survey estimates on sensitive issues than the estimates provided by direct survey methods. The basic idea of Warner RR model is to estimate the proportion of respondents in a dichotomous population whose truthful response to a certain sensitive question would be "yes", without exposing the respondent's privacy to the interviewer and therefore avoiding social stigma or fear of reprisals. To increase survey response rate Warner uses a randomization device, one such device could be e.g., a deck of cards with each card having one of the following two statements: • I belong to group A (group that contain sensitive questions), • I do not belong to group A (non-sensitive group).
Each survey participants performs a random trial consisting of sensitive questions to whom they pretended to respond e.g., (i) I never pay the income tax (i.e., the respondents belong to group A); (ii) I pay the income tax (i.e., the respondents doesn't belongs to group A). The above statements have the probabilities P and 1 − P, in the deck of cards. These relative frequencies are also known as the design probabilities (DP) 0 ≤ P ≤ 1. P is the probability of success that the randomizing device provides instructions to the respondent to responds to the real question. The purpose is to estimate π A (0 < π < 1) the population proportion of respondents belong to the sensitive group A. This is also known as the true probability of selecting the first true answer to the real question.
Under this aspect, a sample of size n is selected from the population of size N through simple random sampling with or without replacement schemes. The member of the selected sample is asked to select a card at random from a pack of ordinary playing 52 cards. Due to the sensitivity of the survey question or due to privacy concerns the respondent is advised not to show the cards to the interviewer. Suppose out of n survey respondents, n 1 respondents are those who say "Yes" answer to a sensitive question. It is assumed that the distribution of n 1 responses follows binomial distribution having parameters n and P y = Pπ A + (1 − P)(1 − π A ) i.e., n 1 ∼ N(n, P y ).
Warner obtained an estimator (maximum likelihood which is unbiased) for π A which exists for P = 0.5 and is defined byπ Where the notation D stands for the direct response (DR) survey and V (π A (D)) = π A (1−π A ) n is the variance of the estimator in a DR survey.
From the above discussion, we concluded that estimates obtained through the Warner RR model estimator are unbiased and of course more consistent with the true population parameters than estimates produced by DR surveys. Nevertheless, in the indirect response surveys, one has to pay more cost in terms of variance. This is because RR survey techniques are most cost-effective and put more burden on the respondents in protecting the privacy of the respondents. This can be seen from variance equation (2) that for P = 0 the variance of the estimator reduces to the variance of the DR survey i.e., V (π A (W )) = Var(π A (D)). For all other values of P (0 < P < 1) the variance of the Warner model estimator will always be larger than the variance of the estimator provided by the DR survey.
After the pioneering work of Warner on the RR model, numerous alternative techniques have been incorporated in the standard RR models to estimate the mean responses of respondents bearing a sensitive group. For more details introduction see e.g., Greenberg et al. (1969), Moors (1971, Boruch (1972), Frankiln (1989), Kuk (1990), Arnab (1996) Chaudhuri (2001), Fenton et al. (2001, Gjestvang and Singh (2006), Van den Hout et al. (2007), in the survey literature on RR models. These RR techniques or procedures are incorporated to estimate the mean responses of respondents belong to the sensitive group of a qualitative random variable (Bernoulli or Multinomial). However, in the survey where the outcome variable is continuous and where the objective is to estimate the average responses then these standard RR models are not the appropriate survey techniques. To overcome this gap Eichhorn and Hayre (1983) proposed a new RR model estimator for the estimation of the population mean of the sensitive quantitative response variable. Such type of models are very beneficial in studies where the response variable is extremely sensitive in its nature. For example, studies where the focus is on addressing issues e.g., i) Not only whether or not a particular woman had gone an abortion, but in addition, how many number of abortions she underwent; ii) Not only whether or not an individual had used condom, but also how many times he/she used condoms; iii) Not only whether a medical student used illicit drugs, but also the number of occasions in which he/she taken drugs; vi) Not only if an individual received black money, but in addition how many time he/she received such money, more such type of example can be seen in Bar-Lev et al. (2004) and references therein.
According to their procedure, respondents are asked about their value of the sensitive quantitative response variable. They are also informed that if they feel that the underlying survey question is sensitive then respond with coded value comprising of their true response value. The coded response (scrambled response) is obtained by multiplying the true response value by some random number which can be drawn from a given distribution such as from normal or exponential distribution. Due to the randomization of response, the interviewer is unaware or unable to track the random number which is used by the respondent. However, he/she knows the underlying parent distribution to which the random numbers are generated for coding responses.
To understand this procedure let the Y be a sensitive random variable whose outcome we want to estimate. Further, let Z be another random variable that is used for the random numbers that is used for the coding/scrambling response. It is also assumed that both these variables are independently distributed having coded response Y * = Y Z provided by the respondent to a sensitive question. The expectations of these variable are denoted by µ y = E(Y ), and µ z = E(Z), and their variances by σ 2 y = V (Y ), and σ 2 z = V (Z), respectively. The coefficient of variations of these variables are denoted by C y and C z . Under this aspect the coded response variable Y * = Y Z has an expectation E(Y * ) = µ y µ z and variance V (Y * ) = σ 2 y µ 2 z + µ 2 y (1+C 2 y )σ 2 z , respectively. Then using the coded responses of size n survey respondents Eichhorn and Hayre (1983) proposed an unbiased estimator for the estimation of population mean µ y of the un-coded response variable Y , is given by:μ Y * i is the sample mean of coded responses of size n. The estimatorμ y(EH) is unbiased and its variance is given as follows: Which has larger variance than the variance of the estimator for population mean in simple random sampling with open or DR survey, i.e., V (ȳ) = σ 2 y /n. More detailed discussion can be seen in Eichhorn and Hayre (1983). Bar-Lev et al. (2004) proposed an estimation procedure for the estimation of the population mean of the sensitive variable, which is a generalization of Eichhorn and Hayre (1983) RR procedure. This RR procedure has several positive features which it exploits for the probability of response i.e., it incorporates two mechanisms of two different models (qualitative response model, quantitative response model). Firstly, it uses the randomizing parameter which is used in the Warner (1965) model the so-called "design parameter". Secondly, it uses the coding mechanism of the Eichhorn and Hayre (1983) RR model. Due to the implementation of these positive RR features for randomization of responses this procedure, therefore, produces quite promising results, which are more efficient than Eichhorn and Hayre (1983) RR procedure in terms of lower variance. Let us consider the previously used variables Y , Z, and a design parameter P (0 < P < 1). For the probability of response, Bar-Lev et al. (2004) assume that with probability P the respondents respond to the true answer of the response variable Y , whereas with probability 1 − P he/she responds with coded or scrambling variable Y Z. Under this respect the layout of the individual response to a sensitive question can be written as: The coded response variable Y * has expectation E(Y * ) = µ y (P+(1−P)µ z ) and variance V (Y * ) = For P = 0, this procedure reduces to that of Eichhorn and Hayre (1983) RR procedure.
Then according to Bar-Lev et al. (2004) an unbiased estimatorμ y for µ y and its variance are obtained as follows:μ with a sampling variance ofμ y(BBB) is given by: where, More details about C * zp can be seen in Bar-Lev et al. (2004) and some latest references on RR techniques. Alho and Rendtel (2020) extended the fade-away phenomenon into time series regression analysis. They proposed a linear model of the panel over time t (=wave), which takes the form:

The fade-away of bias in regression analysis
It is evident from the model that the relationship between the response variable Y i,t and the covariate or predictor variable X i,t varies over time t. a t is the intercept, and b t coefficient. The disturbance terms e i,t satisfy the basic assumptions of the classical linear regression model having means E(e i,t ) = 0 and constant variances Var(e i,t ) = σ 2 . The covariate X i,t consists two components: the fixed permanent component M i and a time-varying components Z i,t which is expressed in the following form: These two components of the covariate X i,t are assumed to be independent from each other and In addition, the disturbance terms e i,t of the model has a variance component structure which is decomposed into two components: the permanent component V i and the transient components U i,t which has the following structure: Following a similar argument, these components of the error term is independent from each other where 0 ≤ γ ≤ 1.
The covatiate X i,t is assumed to be independent with the model disturbance terms e i,t , where it also mean that M, Z, V , and U are independent from each other.
As the covariate and the error terms of the model have a time varying components, it is therefore, assumed that they follow an auto-regressive model of order one i.e., AR(1) which is defined as follows: and where ρ and φ are the coefficient of auto-covariance of the time-varying parts of the covariates and the residual factors, and where ε i,2 and ξ i,2 are the model "noise terms" which satisfy all the assumptions of the classical linear regression model i.e., E(ε i,2 ) = E(ξ i,2 ) = 0 with variances Regarding the participation in the survey the proposal is that, at the beginning of the survey (wave 1=time 1), an individual decides whether he/she would participate in the survey or not.
Let R i,1 = 1 be the binary response indicator of the i person who is willing to participate in the survey initially, while for R i,1 = 0 we have a non-response or stated stated otherwise. Then using the linear model of Alho and Rendtel (2020), the response probability of a person i is defined by where 0 < α < 1 and 0 < β < 1 are the non-response parameters. The values of these parameters are be selected in a such a manner such that the resulting value from the response model (11) falls within the interval [0, 1]. It is also worthwhile to mention that the corresponding response model depends on Y i,1 , which means that the missingness is not random (NMAR), which states that the distribution of the variable of interest Y i,1 at the initial wave 1 is highly selective for the response. Then the marginal probability of response in initial wave is then approximated by where a 1 = E(Y i,1 ). The following approximation results are due to Alho and Rendtel (2020).
The OLS estimates of b t of those who initially participate in the survey is approximated by: with a non-response which is stated by Conclusion: Note that the under the above condition the bias at time t will be always smaller/lower than the bias at time t − 1 i.e., Bias(b t,com ) < Bias(b (t−1),com ), always holds true for all values of the time-invariant and dynamic components of the covariate and error terms in the interval 0 < κ < ρ < 1 and 0 < γ < φ < 1.

Model for response using RR technique
In this section, we introduce a linear model for the response in the application using RR technique. Non-response is present in almost every non-mandatory survey, however, its effects are varying in different surveys depending on the nature of the surveys and the type of the questionnaire. If the survey questions are non-sensitive then almost everyone in the survey is willing to respond. However, if the question of the surveys is sensitive then a huge part of the data is missing due to non-response. To reduce the effect of non-response in cross-sectional data, Warner (1965) proposed RR technique which improves survey response rate through randomization device. To reduce the effect of initial non-response, Alho and Rendtel (2020) proposed a time series model. Alho, don't use the randomization device to control the effect of non-response. To cover this gap, we propose a linear model for the response in the context of panel survey when the outcome variable is subjected to RR. The proposed model has the advantage that it exploits both the randomizing mechanism used in Warner (1965) randomized response model for qualitative data and the coding scheme used in Bar-Lev et al. (2004) for quantitative data. By letting know to individuals that their privacy will be fully protected through the RR technique, suppose those who agree to participate in the survey by answering sensitive survey question(s) in the initial and following subsequent waves of the panel.
Let Y i,t be the quantitative response variable of individual i at time t and let S i be the corresponding random variable representing the random number that is used for the coding mechanism. It is also assumed that the response variable Y i,t is uncorrelated with the random variable S i . Also, let 0 ≤ P ≤ 1 be a DP of the RR model during the data collection which is controlled by the experimenter and is used for randomizing the respondent response. Under this aspect, if the respondent thinks that the underlying survey question is non-sensitive he/she responds with the true value of the response variable Y i,t with probability P. On the other hand, if the underlying survey question is sensitive he/she responds with the coded value of the response variable say the response Y i,t S i with probability 1 − P. Small value of P indicate that the underlying survey question is sensitive which means that most of the survey respondents will respond with the coded variable i.e., Y i,t S i . Let us suppose that R i,t be the binary response indicator of a person i which is equal to one R i,t = 1 if the person agrees to respond at the start of the panel, and R i,t = 0 otherwise. Then following the linear RR approach of Bar-Lev et al. (2004), the response of individual i at the initial and following panel waves can be represented by the following equation: In this pattern, the probability of response of the individual i to a sensitive question can be approximately described by the following linear model: In the context of panel surveys, Alho and Rendtel (2020)  At this point, we further assume that the distribution of the variable of interest in the Resp-Sample is highly selective or skewed for the response at the start of the panel. For the initial non-response of a person i, we suppose that it depends on the Y i,1 or Y i,t S i . This assumption is automatically satisfied from the linear response model given in equation (11). Therefore, if the probability of response holds true then under this case the initial distribution under the Resp-Sample is somewhat far away from the distribution of the Full-Sample. Under this respect, if there is no further drop out of observations in following panel waves, then according to the bias approximation formula of Alho and Rendtel (2020) in Section 2 and the simulation study of Khan (2020) the distorting effects of initial non-response on the distribution of Y i,t under the Resp-Sample has to faded-away over time t = 2, 3, 4, . . . , T by reaching to the steady-state distribution of the Markov chain. Alternatively, if the model for response does not depend on either Y i,1 and/or Y i,t S i then in this case there will be no bias in the estimates of the Resp-Sample or non-response is missing at random (MAR). When there is no bias in the Resp-Sample then in this case the distribution of the Resp and the Full samples will be the same. This assumption holds true for small values of the DP (for details see figures).
Therefore, we assume that the initial non-response is not NMAR or highly selective for the estimation of population parameters. And therefore the initial distribution of the Resp-Sample at the start of the panel is somewhat away from the distribution of the Full-Sample. Under this respect, if there is no further drop out after wave 1, then according to the bias approximation formula proposed by Alho and Rendtel (2020) in Section 2 the distorting effects of initial non-response on the distribution of Y i,t in the Resp-Sample has to fade-away over time t = 2, 3, 4, . . . , T .

Monte Carlo simulation study
In the previous section, we theoretically discussed the fade-away of initial non-response of the where a t = 1 for the intercept and b t = 1 for the slope coefficient. We generated a synthetic population of size 1,000 observations from the model over 100 Monte Carlo replications. For the data generation, we assume that the underlying distributions of the covariate X t = M + Z t and the error term e t = V + U t are normally distributed. Under the assumption of uncorrelated variance components M and Z t of the covariate X t , having expectations µ m = µ z t = 0, and variances σ 2 m = κ and σ 2 z t = (1 − κ), respectively.
Following a similar pattern, the residual term e t is decomposed into two uncorrelated components known as: the permanent component V and the transient component U t having their expectations µ v = µ u t = 0, and variances σ 2 v = γσ 2 and σ 2 u t = (1 − γ)σ 2 , respectively.
Further, we assume that the white noise error terms ε t and ξ t of the AR(1) models are independently and identically normally distributed random variables having expectations zero's, and In addition, we assume that the random variable S which is used for the coding mechanism also, follow normal distribution having expectation µ s = 0 and variance σ 2 s = 0.20, i.e., S ∼ N(0, σ 2 s = 0.20).
As the fade-away hypothesis strongly depends on the size of the permanent and transient components of the covariate and error terms and the size of the DP (P). We, therefore, check its turnover for various stabilities of model covariates. Then two cases arise. In the first case, we assign equal weights to permanent and transient components of the covariate and the error term. This case is known as an equal stability scenario. While there are some intermixed cases where we assign different weights to permanent and transient components of the covariate and the error term. This case is known as an unequal stability scenario. All these cases are thoroughly discussed in Section 2. In this section we are only considering the equal case stability scenarios denoted by Scenario A −C and is given by: • Scenario A: Low stability κ = γ = ρ = φ = 0.10, • Scenario B: Medium stability κ = γ = ρ = φ = 0.50, To make things clear we multiply the RB and MSE by 100, e.g.,

Results discussion
This paper presents the OLS estimation method to estimate the slope coefficients of the linear regression model where the response variable is sensitive or is subjected to RR. The bias of the OLS estimates in the application of the RR technique is then compared with the bias of estimates of the estimates using DQ technique. The general idea of the study is that when the outcome variable for the response then RR techniques provide more accurate results than the DQ survey techniques. Therefore, in this article, we use the RR survey technique for obtaining survey information on sensitive issues. This technique improves response through the randomization device. In general, the RR technique performs well as compared to any direct surveys see e.g., Lensvelt-Mulders et al. (2005). This is also verified from the simulation results of our simulation study which demonstrate that the proposed RR technique provides accurate estimates. Further, to check the size of the initial bias and its fade-away in follow-up panel waves we restrict our analysis up to 10 panel waves of the survey. Usually, ten waves of the panel are sufficient to check the behavior of the estimator and its fade-away effect.
Surveys, where a huge part of the data is missing due to non-response, may produce bias statistical analysis that not only reduces sample size but may produce bias in estimates. There are many reasons for the high non-response rate (we discuss it earlier). However, some such reasons for non-response are questions on sensitive issues such as the use of drugs, abortion, watching porn, black money, divorce, etc. Such questions are considered sensitive questions. If such types of questions are included in the survey questionnaire then survey respondents are reluctant or afraid to answer. As a result survey response rate is low or in other words the non-response rate is high. High non-response may not only reduce case numbers of data but it has the power to make the analysis biased.
To improve response rate Warner (1965) proposed an alternative survey technique know as the randomized response (RR) technique which improves survey response rate by protecting respondent privacy through randomization devices. Therefore, to check the size of the estimated slope coefficients and their fade-away over time we run a simulation study for various values of the DP (denoted by P) of the Warner model. As the values of the DP varies in the interval 0 ≤ P ≤ 1. For P = 1 the Warner RR technique is equivalent to the DQ technique. For P = 1, it also means that respondents are asked to respond to sensitive questions directly without using any randomization device which protects their privacy. So we expect a large non-response rate on a large value of P say 0.5 ≤ P ≤ 1, while for small or low values of P say 0.1 ≤ P ≤ 0.5 we expect a low non-response rate. Low values of P usually less than P ≤ 0.5 indicate respondent privacy is well protected. In other words, it means that small values of P ≤ 0.5 indicate that respondents respond to sensitive questions in coded values.
As the Full-Sample consists of both the respondents and the non-respondents, therefore there is no bias in estimates under this sample. This holds true for all simulation scenarios (Scenario A-C). We denote the bias of the estimates under the Full-Sample by "Bias Full" highlighted by the color "green". The results of our simulation study are summarized in Figure 1 to Figure 3.
On Y-axis we have the bias (relative bias in percent) of the estimate, while on the X-axis we have the wave of the panel. The colored's circle points in the graph show the relative bias of the estimates obtained from our regression model in a specific wave of the panel. The graphical results demonstrate that for large values of P the bias of the estimates is substantially larger than the bias of the estimates for small values of P. Regarding the fade-away effect, it strongly depends on the stability of the model parameters (κ, γ, ρ, φ ). For example in Figure 1 (Scenario-A) the bias of the estimate in wave 1 for P = 1 is 37% (color: blue), while for P = 0.8, 0.5, 0.3, 0.2, 0.1 the bias of the estimate is 37% (color: orange), 35% (color: red), 25% (color: brown), 14% (color: pink), and 2% (color: black), respectively. Here it can be seen that for all values of P the bias of the estimate is very small at P = 0.1 this is since the respondents have the higher chance to scrambled their responses through the RR technique.
Regarding the fade-away effect, we see that small/large initial non-response bias of the OLS This is clearly shown in Figure 2. Nevertheless, the initial bias of the estimates and its selection effects on the estimates reduces geometrically in follow-up panel waves. However, after some waves, the effect of non-response on the estimated slope coefficients remains permanent. This indicates the fixed permanent component of the error term which is related to non-response.
Finally, Scenario C (High stability: κ = γ = ρ = φ = 0.90) is the more extreme scenario where the AR covariate and AR error term of the model is highly correlated with its previous time values of the model. Also as have been discussed in the previous discussion that the fadeaway of bias depends on the size of the fixed permanent components of the covariate and the error term. If the size of permanent components is large then their distribution remains permanent and hence the effect of initial non-response doesn't reduce in subsequent panel waves. Thus no fadeaway is present for Scenario-C. This can also be checked from the bias approximation formula of Alho and Rendtel (2020), and the simulation study of Khan (2020). Now, the situation is completely different for the same Scenario-C under the RR technique. E.g., by plotting the results under this scenario which are illustrated in Figure 3. For the said scenario the estimated slope coefficients of the simulation study swing into the steady-state distribution of the Markov chain.
The reason for such a change is clear with the RR procedure respondent are more confident about their privacy to a certain sensitive question. Thus more respondents participate in the survey with more true answers, as a result, non-response bias fades-away over in later panel waves. Overall, our results are completely in line with the theoretical results of Alho and Rendtel (2020), the simulation results of Khan (2020), the theoretical results of the Warner (1965) model, and the coding mechanism in Bar-Lev et al. (2004).
Concerning the privacy protection of the respondent, we see that the design parameter plays a central role in the privacy protection of the respondent to a certain sensitive survey question.
And as can be seen from the probability of response given in equation 1, a small value of P means that the privacy of respondents to a certain sensitive question is well protected. Thus for P = 0.1 almost 90% of the survey respondents protect their privacy through RR technique.
Thus if survey contains sensitive question then the use of the RR technique provide consistent parameter estimates but it does not guarantee for the unbiasedness as a bias of 2% is always present in the "Resp-Samples" estimates.     However, for the "Resp-Sample" the analysis is affected by non-response up to some extent depending on the size of the DP and the size of the stability parameters (κ, γ, ρ, φ ). In the figure we see that in all scenarios the estimates of the "Resp-Sample" is less affected by bias for low values of P say for P = 0.10 (10%), while is it greatly affected for all other values of P. Therefore, the MSE of the estimates are very small for P = 0.10, and it is substantial for all other values P > 0.10. Again this is true for all Scenarios A-C. Regarding the fade-away effect, it is present in all the figures, i.e., over time it has the tendency of downward MSE's. However, the speed of the fade-away is fast in Figure 4 (Scenario-A: κ = γ = ρ = φ = 0.10). This is because the regression models are less correlated (10%) over different cross-sections and over different time periods.
For Scenario-B ( Figure 5) the speed of the fade-away is strong due to medium correlation over different cross-sections and over different time periods. While the speed factor of the fade-away effect in Scenario-C ( Figure 6) is quite lower than the speed factor in Scenario A-B (Figure 4 and Figure 5). The reason for this change is that the regressions in this scenario are not varying so much over time and thus it is the main reason for the slow fade-away of MSE's. As expected from the theoretical results of Alho, with low stability of covariates and error term the OLS estimates have the smallest bias which in turn lead to the smallest MSE value. Therefore, the efficiency of the OLS estimator is highest in Scenario-A. While the efficiency of the estimator is reduced considerably in Scenario B-C, when the size of permanent and time-varying parts of the covariate and error terms are medium or high (see Figure 5 and Figure 6).
Overall we concluded that the speed of the fade-away effect of the bias/MSE strongly depends on: i) the size of the permanent/temporal components of the covariates and the model noise error term, ii) It also strongly depends on the DP if the value of DP is small then we expect substantial reduction not only for initial waves estimates but also for the follow-up panel waves, iii) It also depends on the fast turn-over covariates (

Compliance with ethical standards
Conflict of interest I declare that I has no conflict of interest.
Ethical approval This article does not contain any studies with human participants or animals