Forecasting and Controlling Key Performance Indicators in Call Centers

This paper proposes a methodology for modeling and controlling the performance of call centers. Most call centers use CRM (Customer Relationship Management) systems to record data of all contacts between agents and clients. These data may be autocorrelated. To model autocorrelated processes effectively, the proposed methodology integrates in a logical way ARIMA (Autoregressive Integrated Moving Average) modeling and SPC (Statistical Process Control) tools. ARIMA is used to model the process and identify the model that best fits the time series. The fitted model is used to compute residuals, predict future values for the quality variable(s) being monitored and determine the prediction errors. To achieve these goals, the Box-Jenkins methodology is employed. These outputs are then used to apply SPC , in this case the Shewhart control charts for autocorrelated data. First, the computed residuals are used to build the control charts in Phase I of SPC , verify the process stability and estimate the process parameters. Then, these parameters are used to establish the control limits of the charts used in Phase II of SPC to monitor and control the prediction errors. The proposed methodology is tested in a case study of a large call center in Portugal. The results of the case study suggest that ARIMA modeling and SPC , when properly integrated, provide a set of effective tools for monitoring call center performance when autocorrelated data are available. This paper has important implications for both theory and practice.


1.Introduction
Performance measurement is a research topic that has attracted considerable academic and practitioner attention, in both the manufacturing and service industries. In service industries, the evaluation of call center performance has been extensively investigated, given its importance in the decision-making process of managers of call centers. Performance measurement can help managers know whether the interactions between agents and clients are conducted effectively. Managers can learn from these experiences and use the evidence systematically to improve the quality of the service they offer. Call centers can be broadly defined as facilities designed to support the delivery of an interactive service via telephone communication (Excoffier et al., 2016).
To evaluate and control call center performance in a reliable way, it is important to secure on-line real-time measurements of Key Performance Indicators (KPIs). To achieve this, call centers have been replacing traditional information systems with Customer Relationship Management (CRM) systems. These systems can collect and record, in real-time, all calls and log all agents' and customers' activities in a single system, producing a massive amount of data. These data can be collected from different call center units, which creates new challenges for the way call center managers control the performance of their agents. Collecting and recording massive amounts of data is not enough, on its own, to inform policies and decision making in any organization. One challenge call center managers face is how to analyze and transform these data into information that adds value and supports their decision-making in real-time and in an effective way. Another challenge is how to use historical activities and customer experiences to predict future events, and take action to avoid future problems and bottlenecks.
A third challenge is more technical than strategic. As is known, automated systems, such as CRM, record all measurements of call center KPIs. As a result, the probability of producing autocorrelated data is higher. Thus, these data must be properly modeled to ensure reliable analysis and prediction of the way the performance of call center agents evolves over time. This can be achieved by applying advanced statistical tools that are effective in the modeling of autocorrelated data. Autoregressive Integrated Moving Average (ARIMA) modeling is a popular and flexible class of prediction techniques that use historical data to forecast time series that are autocorrelated (Khashei et al., 2012, Villalobos et al., 2021. After predicting the trend of KPIs in call centers, managers need to monitor the quality of service in order to ensure customers are satisfied and identify potential areas for improvement. Attention to the quality of customer service is a basic condition for attracting new customers and retaining current customers. Statistical Process Control (SPC) is an approach that has been widely applied in many industries to improve quality (Özdemir, 2021, Yang andZhou, 2015). It is a powerful collection of problem-solving tools that is useful in achieving process stability and improving capability through the reduction of variability. It involves the collection, analysis, and interpretation of data from quality monitoring activities (Montgomery, 2020). Control charts are one of the powerful SPC tools and be used to monitor the evolution of one or more quality characteristics, i.e., to monitor whether a process is in control or not Turgut, 2021, Zan et al., 2020). They can provide information about a process mean and variance and detect the presence of special causes of variation, enabling quick intervention to eliminate these causes and bring the process into a state of control where the average proportion of out-of-spec units is below some specified level (Montgomery, 2020, Quesenberry, 1997).
An analysis of the current literature reveals that, although performance measurement of call centers has been extensively investigated (see Table 1), there are no studies that apply ARIMA modeling and SPC to predict and control call center KPIs in situations where autocorrelated data are generated. Those studies that provide guidelines to evaluate call center performance do not explain how to monitor the performance of call centers over time, or how to deal with autocorrelated data (e.g. (Baraka et al., 2013, Baraka et al., 2015, Ma et al., 2011). This paper proposes an innovative methodology to forecast and monitor the performance of call centers' agents, and quickly detect pattern changes due to abnormal situations. The proposed methodology integrates ARIMA modeling and SPC tools to address the issue of autocorrelated data explicitly.
The remainder of this paper is organized as follows. In Section 2 we provide the theoretical background to call center performance, ARIMA modeling and SPC, with emphasis on Shewhart control charts. Section 3 presents the proposed methodology and explains the steps necessary to implement it. In Section 4, the applicability of the proposed methodology is tested in a case study of a large call center in Portugal. This section presents the data, and discusses the results. Lastly, in Section 5, conclusions and suggestions for future work are presented.

Theoretical background 2.1 Call center performance measurement
High competition in the market has made performance improvement one of the main concerns of all businesses including call centers. KPIs are critical indicators to measure the performance of an organization or a special process. KPIs have been used for a long time in call centers and have been described in several research studies (Mehrbod et al., 2017). KPIs should only be changed if a company's primary objectives also change. However, it is important to use KPIs which are completely relevant and clearly show the results, to know if the call center is on the right track.
Over the years, several studies of performance measurement in call centers have been conducted. These studies have identified several metrics that are widely used to evaluate call center performance in different domains and from different perspectives, such as service quality, agent performance, process efficiency, emotional labor, working conditions, job satisfaction, and so on. These KPIs and their definitions are summarized in Table 1.  (Valle et al., 2012) To propose an approach to predict the performance of sales agents in a call center dedicated exclusively to sales and telemarketing activities Logged hours, talked hours, effective contacts, finished records Naïve Bayesian classifier (Garcia et al., 2012) To investigate customers' satisfaction with telephone waiting time Waiting time and customer satisfaction Discriminant analysis (Chen et al., 2011) To analyze first call resolution performance in order to produce decision rules First call resolution Rough set theory (Cheong et al., 2008) To identify the KPIs affecting customer satisfaction with a call center's services In-depth interviews to senior managers (Laureani et al., 2010) To explore the application of lean six sigma in call centers First-call resolution ratio, operator turnover and streamlining of processes Lean Six Sigma methodology (Robinson and Morley, 2006) To investigate call center management from the perspective of managers Customer satisfaction index, level of service, staff turnover rate, abandonment rate, wrap-up time (post call work), call duration or average handling time, occupancy rate and number of calls per agent Survey of call center managers, followed by indepth interviews (Tuten and Neidermeyer, 2004) To investigate the role of optimism and its effect on stress in call centers Personal orientation towards optimism, perceptions of job stress, work/non-work conflict, absenteeism and intent to turnover Survey approach (Lewig and Dollard, 2003) To assess the relationship between the emotional demands associated with call center work and call centre worker well-being Emotional demands, psychosocial demands, rewards, autonomy, social support, emotional exhaustion, job satisfaction Survey of call center workers (Grebner et al., 2003) To analyze working conditions, well-being, and job-related attitudes among call center agents.
Working conditions, well-being, and job-related attitudes Hypothesis testing (multivariate and univariate analysis of variance) Table 1 shows that there is no shortage of performance indicators that can be used in the analysis of call center performance. Since more comprehensive and reliable analysis of call center performance is strategic for large call centers, it is important to evaluate which KPIs are more relevant, and more or less efficient for the company. Some of the most important KPIs used by call center managers are provided in Table 2, together with their definitions. Average time a customer has to wait before contacting an agent; The time a caller waits on the line (Parmenter, 2015, Ma et al., 2011 First Call Resolution The percentage of calls received by the call center in which the customer had their needs met during the first contact without having to transfer, escalate or return the call; The percentage of calls closed on the first connect (Ma et al., 2011, Jouini et al., 2013 Average Answer Time The time between answering a call by agent and terminating the call (Feinberg et al., 2000) Abandonment Rate The proportion of people who disconnect during the operation, either before they are answered by an agent or during a call, usually because they have to wait too long during the transfers; The percentage of callers who disconnect prior to being answered (Feinberg et al., 2000, Ma et al., 2011 Average Hold Time The average time that the caller stays on hold by agent for searching in the system/web or getting help from supervisor (Parmenter, 2015) Hold Abandonment Rate The proportion of customers who were on hold but left the call because of the delay in service (Mehrbod et al., 2017) After Call Work The average time it takes each agent to terminate the process after disconnecting the contact (Feinberg et al., 2000) Number of Answered Calls The number of calls answered by agents in a period of time such as per hour or per day (Parmenter, 2015) Customer Satisfaction The number of customers, or percentage of customers, whose reported experience with a firm, its products or its services (rating) exceeds specified satisfaction goals. (Farris et al., 2010)

Shewhart control charts
A control chart is a graph of the values of a statistic of a given quality characteristic that has been measured or computed from a sample plotted against the sample number or time (Montgomery, 2020). The general form of control charts was first developed by Dr. Walter A. Shewhart at Western Electric's Bell Telephone Laboratories beginning in about 1924. A classical Shewhart control chart consists of a center line (CL) and two control limits, one on either side of the CL, called the upper control limit (UCL) and lower control limit (LCL). The CL represents the average value of the quality characteristic corresponding to the in-control state, and the other two lines are chosen to ensure that, if the process is in-control, nearly all of the sample points will fall between them (Kaya et al., 2017) and exhibit a random pattern. That is, only chance causes are present. One of the fundamental assumptions in the development of the classical Shewhart control chart is the Normality of the underlying distribution of the quality characteristic. Thus, if ω is the sample statistic that measures some quality characteristic of interest with mean and standard deviation , then the CL, the UCL and the LCL are given by Equation 1 (see (Montgomery, 2020)).
where k is the distance of the control limits from the centerline, expressed in standard deviation units.
When ω is a Normal random variable, and the process is in-control, the probability that it will fall above the upper control limits is 0.00135 and is the same that it will fall below the lower control limit, resulting in significance level α of 0.27%. Thus, it is possible to conclude that ⁄ = . = 3. As a result, the control limits provided in Equation 1 are written as shown in Equation 2.
Classical Shewhart control chart implementation involves two phases, with different objectives. In Phase I, the main objectives are to estimate the process parameters and to establish the control limits to be used in Phase II. In this phase, the control limits are determined based on the m subgroups and the data plotted on the control charts. Points that are outside the control limits are investigated and excluded, since, when a process is in-control, the control limits reflect only common causes of variation. This procedure is iteratively repeated because sometimes this type of retrospective analysis will require several cycles until achieving control limits that represent an in-control process performance. Then, the control chart designed with the control limits of Phase I is used to monitor the process by comparing the sample statistic for each successive sample as it is drawn from the process to the control limits (Montgomery, 2020). Unlike the action taken in Phase I, when a point falls outside the control limits in Phase II, it should not be eliminated and new limits should not be computed. Instead, the cause(s) of the out-of-control situation(s) should be investigated as soon as possible in an effort implement corrective actions and bring the process to the in-control state. This approach avoids the production of nonconforming units in the near future .
Another important aspect in the application of classical Shewhart control charts is the validation of two underlying assumptions. One of them is the Normality assumption, mentioned above. This assumption is easily satisfied when the control charts are developed using sample statistics from subgroups of four or more units. According to the Central Limit Theory, for most probability distributions, the distribution of the sample means is approximately normal for sample sizes of four or more units, even when the individual measurements do not follow a normal distribution. The second assumption, which is more difficult to satisfy and may contribute to a poor performance of the Shewhart control charts, is the one of independence. According to (Montgomery, 2020), the existence of quality characteristics that exhibit even low levels of correlation over time is the main reason that classical Shewhart control charts do not work well. (Montgomery, 2020) also emphasizes that even in situations where the normality assumption is violated to a slight or moderate degree, the control charts will still work reasonably well if the independence assumption is satisfied. Unfortunately, the assumption of independent observations is usually not satisfied in situations where consecutive measurements of quality variables are performed (e.g. chemical processes) or in situations where measurements are performed using automated information systems that collect and record all generated data (e.g. call centers). The implementation of classical Shewhart control charts when the independence assumption is not satisfied will give misleading results, with too many false alarms being produced. In other words, the control chart will indicate many out-of-control situations when the process is in an in-control state. One approach that can help to overcome this issue, is to integrate ARIMA modeling to remove the autocorrelation from the data and then apply control charts (see Section 3.2).
Classical Shewhart control charts can be grouped into two categories, depending on the type of quality characteristic to be controlled. Quality characteristics that are measured on a continuous scale (e.g. times, volumes, weights, diameters, dimensions) are called variables, and are controlled through control charts for variables. When dealing with this type of quality characteristic, since the process dispersion does not only depend on the central tendency (Pereira and Requeijo, 2012), it is a standard practice to implement two control charts, one to control the mean value of the quality characteristic and other to control its variability. The process average is typically controlled using a control chart for the mean, called the ´ chart. The process variability is usually controlled using a control chart for the standard deviation (S chart) or a control chart for the range (R chart). In situations where the sample used for process control consists of an individual unit (n = 1), a control chart for individual measurements (X chart) is used for controlling the process average . In this situation, the process variability is controlled through a control chart for the moving range(MR chart). It is a standard practice to use the moving range of two successive observations to estimate the process variability, where each moving range is defined as On the other hand, characteristics that cannot be conveniently represented in continuous scale are called attributes (e.g. number of non-conforming units, number of defects per unit), and are controlled through control charts for attributes. These include control charts for fraction nonconforming (p-chart), control charts for number of nonconformities (np chart), control charts for number of defects (c-chart) and control charts for number of defects per unit of product (uchart) (for details, see (Montgomery, 2020, Pereira andRequeijo, 2012)).
In this paper, the quality variable to be controlled is the Average Hold Time (AVGHT), which is continuous. Measurements of this variable are performed automatically for all calls. As a result, X and MR charts are the rational choice, but applied to residuals and prediction errors (see Section 4.3).

Control charts for autocorrelated data
As explained in Section 2.2, classical Shewhart control charts are applicable when the measurements of the quality characteristic are independent. If the process data exhibit autocorrelation, at least three approaches are possible (Pereira and Requeijo, 2012): (1) use the classical Shewhart control charts, CUSUM or EWMA, but with modified control limits to take account of autocorrelation in the process, (2) determine the mathematical model that best fits the autocorrelated data and build the control charts for residuals or prediction errors (Shewhart, CUSUM or EWMA), and (3) implement specific charts such as the Moving Centre-line EWMA (MCEWMA) chart or the EWMAST (EWMA for stationary processes).
In this research, the second approach is chosen, namely the application of Shewhart control charts to residuals and prediction errors, based on individual measurements. This approach, which is based on time series models, has proved useful in dealing with autocorrelated data (Requeijo and Cordeiro, 2013). It consists of modeling the correlative structure of the time series and use that model to remove the autocorrelation from the data, and apply control charts to the residuals and prediction errors (Montgomery, 2020, Pereira andRequeijo, 2012). The random variables represented in the control charts are not the individual measurements X but the residuals (Phase I) and the prediction errors (Phase II) (Montgomery, 2020, Pereira andRequeijo, 2012). To study the time series of the collected data and determine a mathematical model that best fits this series, we use ARIMA modeling (see Section 3.1), using the method described by (Box et al., 2015), which is briefly described in Section 3.3. In most cases, after removing the autocorrelation from the data, the corresponding residuals will be normally and independently distributed, satisfying the two assumptions underlying the implementation of Shewhart control charts. This forms the rationale for choosing the second approach to deal with autocorrelated data.

The proposed methodology
Call centers' KPIs have been modeled using different techniques, as shown in Table 1. Despite some of these techniques can be used for predictions, the main advantage of the methodology proposed in this paper is the ability to deal with autocorrelated data. By applying ARIMA modeling, the autocorrelation in the dataset can be effectively modeled and removed. Another advantage is the ability to predict and monitor call centers' KPIS, and also to detect pattern changes. In the following sections, the proposed methodology is explained in detail.

ARIMA Modeling
As mentioned in Section 2.3, one way to remove inherent autocorrelation in the process data and apply Shewhart control charts in a reliable way is to fit time series models to the quality characteristics to be controlled, using the method described by (Box et al., 2015). ARIMA modeling is a widely used and flexible technique that is used to specify a model for time series data that contain autoregressive (AR), differencing, and moving average (MA) components (see (Box et al., 2015)), i.e. time series that are autocorrelated. In short, a time series is autocorrelated if the value of variable X at time t depends on the past values lagged by one or more periods. In such situations, ARIMA is a powerful technique to model the process and determine the mathematical model that best fits the time series under analysis. The specified mathematical model is then used to extract the inherent autocorrelation and then use residuals for forecasting.
A general expression for an ARIMA model is where, In Equations 3 -7, B is the backward shift operator, is the backward difference operator, d is the differencing order needed to make the process stationary, Xt is the measurement at time t, εt is the residual or error at time t, ( ) is the autoregressive polynomial of order p and ( ) is the moving average polynomial of order q.
To model a process using ARIMA, we must determine the ARIMA (p, d, q) model that best fits the time series. A standard approach to achieve this goal is to compare the Estimated Autocorrelation Function (EACF) with the theoretical Autocorrelation Function (ACF) and the Estimated Partial Autocorrelation Function (EPACF) with the theoretical Partial Autocorrelation Function (PACF).
A practical way to perform these comparisons is to design their correlograms and analyze the behavior of the lags. This procedure enables the researcher to identify an appropriate model of the correlative structure of the time series. One important requirement is that the series must be made stationary before we can usefully interpret the correlogram (Chatfield, 2000). When this requirement is satisfied, the process can be modeled as an AR(p), an MA(q) or an ARMA (p,q).
Cutoff after lag p, i.e. sudden decays to zero, from lag q + 1 Tails off, i.e., decays exponentially or as a damped sine wave to zero Tails off, i.e., decays exponentially or as a damped sine wave to zero In the equations provided in Error! Reference source not found., ξ is the parameter that determines the process mean, ∅ the order parameter j of the AR or ARMA model, the order parameter j of the MA or ARMA model, and the variance of residuals.
Once the model that best fits the time series is identified, it is possible to compute the residual values et at each time t, as the difference between the value of the quality variable at time t (Xt) and the expected value at the same time t ( ^ ). It is also possible to estimate the values of the process parameters, which correspond to Phase I of SPC. In Phase II, as the ARIMA model that best fits the process data is known, it is possible to predict the future values of the quality variable under analysis. Using the results of these predictions and the new data, the researcher can determine the prediction errors and use them to build the control chart of prediction errors (see Section 3.2).

Shewhart control charts for residuals and prediction errors
After modeling a process with autocorrelated data, determining the residuals and verifying their independence, the Shewhart control charts can be designed, for Phase I. In this case, the random variable is the residuals instead of the quality variable X. The control charts are designed depending on the nature of the collected data. If samples are collected, the sample of size n, collected at time t, is constituted by the measurements of the quality variable X in that time t, given by Xt1, Xt2, …, Xtn. This sample, in turn, gives rise at the same time t, to a sample constituted by the n residuals et1, et2, …, etn. Based on this sample of n residuals, statistics such as the sample mean ´ , the sample range and the sample standard deviation can be computed. On the other hand, when individual measurements are collected, the statistics to be considered at time t to control the process mean and the process dispersion are the residual and the moving average , respectively. As mentioned in Section 2.2, a standard practice for computing the moving range is to use two successive observations, in this case, two successive residuals, as shown in Equation 8.
Once the sample statistics are computed, ´ and for mean and standard deviation charts, ´ and for mean and range charts respectively, or and for individual measurements and moving range charts, the corresponding average values ´, ´, ´, ´, and ´ are determined and used to establish the control limits for Phase I of SPC, as illustrated in Table 4Error! Reference source not found.. Table 4Error! Reference source not found. also provides the estimator for the standard deviation of the residuals, ^ .  Table 5Error! Reference source not found.. The process parameters μ and ξ, used in those equations are computed according to the equations provided in Table 5Error! Reference source not found., where, is the correlation coefficient of lag j and is the auto-covariance of lag j.
After replacing the points corresponding to special causes of variation, it is necessary to compute the new residuals, and update the control charts limits. In situations where a significant number of out of control points is found, it is necessary to investigate the root causes and implement appropriate corrective actions to achieve the process stability. Even if process stability is achieved, before starting the Phase II of SPC, capability analysis must be performed in order to check if the process can produce according to the technical specifications. If this requirement is not achieved, the root causes must be identified, and corrective action taken.
Once the process is stable and is shown to be capable, Phase II of SPC can be initiated. The objective of Phase II is to monitor the evolution of the quality variable to quickly detect the occurrence of abnormal patterns and take appropriate action. The method to monitor future values of the quality variable, in autocorrelated processes, consists of implementing Shewhart control charts applied to prediction errors. In these control charts, the statistics represented in the charts are the prediction errors, defined by Equation 9.
The distribution of prediction errors is characterized by an expected value of zero and a variance defined by Equation 10.
In Equations 9 and 10, ( ) is the prediction error for time + , is the value of X at time + , ^ ( ) is the prediction for time + , performed at time t, and are coefficients determined from = ( ) . where the standard deviation of the prediction errors ( ) is given by ( ) .

Box-Jenkins methodology to build an ARIMA model for call centers KPIs
To identify the mathematical model that best fits the autocorrelated times series investigated in this paper and build the control charts for residuals or prediction errors, the (Box et al., 2015)' methodology has been applied. This classical approach has been successfully employed in different fields to model autocorrelated time series. Building an ARIMA model using this method is usually best achieved by a three-step iterative procedure based on identification, estimation and diagnostic checking, as shown in Fig 1. This three-step iterative procedure enables the researcher to identify the mathematical ARIMA model that best fits the time series, eliminate the inherent autocorrelation in the time series and then use the model to predict future values of the quality variable, and determine the prediction errors. 1. ARIMA modeling: In this phase, the first task is to check if the time series is stationary, (see the rationale in Section 3.1). This goal is achieved by plotting and analyzing the time series (t, Xt), which may reveal non-stationary behavior of the process parameters, i.e., mean, variance or both. The second task is to determine the ARMA(p,q) model that best fits the time series. As mentioned in Section 3.1, the best ARIMA model can be identified by comparing the behaviors of the EACF with the ACF and the EPACF with the PACF.
The potential behaviors of the ACF and PACF are summarized in Table 3Error! Reference source not found.. The third task is to estimate the parameters ∅ , , ξ, and . After estimating these parameters, it is necessary to examine their adequacy, i.e., to check if they meet a set of requirements, namely those related to the normality and independence of residuals (see Section 2.2). It is important to test if the model is stationary and invertible, if the estimated coefficients are statistically significant and are independent of each other.

Model fitting:
It is the nature of real-time series to have noisy data and outliers. In fact, when an ARIMA model is fitted to a time series, the residuals of the outlier observations are usually higher. To find these outliers, control charts are helpful. By applying control charts to the residuals of the fitted model, it becomes easy to identify data points that do not belong to the distribution of residuals and any other abnormal patterns in the evolution of the residuals. If the control chart shows a significant number of out of control situations, this indicates that the identified model was not well-fitted to the time series. After finding the outliers and/or the out of control points, they must be "removed" from the time series in order to have a clean dataset. Then, a new model that properly fits the cleaned dataset must be identified, since replacing some data points probably changes the model fitted previously. Then a new model is fitted to the new time series. This iterative process continues until the pattern of the residual's evolution exhibits an in-control state. 3. Monitoring: After ensuring that the previous requirements are satisfied, we can conclude that the identified model is the most appropriate to model the process under study. The last phase in building an ARIMA model is to monitor the prediction errors, which implies the prediction of future values of the quality characteristic and the determination of the corresponding prediction errors (see Section 3.2).

Results and discussion
In this section, the methodology based on control charts for autocorrelated data, described in Section 3.3, is applied to forecast and control a call center critical-to-quality (CTQ) variable, namely the AVGHT. This KPI has been chosen because it is considered of great importance to improve the customers satisfaction. In the following section, we describe the steps conducted to build the best ARIMA model for forecasting and controlling this KPI.

Data Collection and Pre-processing
The data used in this paper were provided by an inbound call center of a big company in Portugal. They were automatically collected and stored by a CRM system. This database includes interactions for one month of call center activity. The data contains more than 175 thousand records of inbound calls recorded call by call. The database stores data on 4 important call center KPIs, namely Average Answer Time, Average Hold Time, After Call Work and Customer Satisfaction for each call. "Customer Satisfaction" is an attribute data with binary feature, "Good" or "Bad". This feature specifies the customer view about their conversation, which is solicited by the call center agent at the end of the call. However, in many cases customers refuse to give feedback at the end of the conversation, resulting in missing required data on the "Customer Satisfaction". In the first step of data preparation, the records of the calls with missing data were removed from the dataset. Since a significant number of clients that refused to give their feedback was found in the dataset, the KPI "Customer Satisfaction" has not been modeled in this research.
Moreover, the methodology proposed in this paper is not applicable to attribute data such as "Customer Satisfaction".
Regarding the KPI "After Call Work", we analyzed the dataset and concluded that more than 95% of calls lasted less than 16 seconds and, approximately 98% of them lasted less than 24 seconds. Therefore, we consider that the potential improvements for this KPI would not be significant. Consequently, we excluded this KPI from the study. "Answer Time" and "Hold Time" are not only dependent on the agent's performance but also on the type of clients and their questions. In fact, a preliminary descriptive statistics analysis of the individual measurements revealed a significant variation in the dataset of these two KPIs. This suggests that these KPIs have a variation that is very difficult to predict and perhaps their measurements do not follow Normal distributions. To confirm this suspicion, we applied the Kolmogorov-Smirnov (K-S) test and, for a significance level = 5%, we rejected the null hypotheses in both cases. Therefore, we concluded that observations regarding these two KPIs do not follow Normal distributions. To achieve a Normal distribution for these two KPIs, we computed a new statistic based on their average. It is common practice in call centers to track and monitor the KPIs based on their average. Due to the multiplicity of agents in the call centers, building a model for each agent is timeconsuming and may not be feasible in terms of resource management. Consequently, call centers' managers usually track the averages and then, if any abnormal situation is reported, they go future into detail to identify the root causes and take corrective actions. In this research, each average was based on samples data of one working hour. Table 7 shows the descriptive statistics of Hold Time and Answer Time, before and after averaging. As can be seen in Table 7, the standard deviation of both variables decreases significantly after averaging.

ARIMA modeling for Average Hold Time
In this section, ARIMA modeling is used to predict and control the behavior of AVGHT, as mentioned in Section 4. The model is built using a sample of 260 working hours. To predict the future values of this KPI, 40 working hours have been used. All the steps to apply ARIMA modeling to the AVGHT were performed in STATISTICA 12. Finally, the prediction results were compared with the real data to evaluate the model adequacy.
As explained in Section 3.3, in the first step we plotted the time series of the 260 observations of the AVGHT to confirm that the series was stationary. An analysis of Fig 2 reveals that the resulting time series is stationary (differentiation level is zero). As a consequence, an AR (p), MA(q) or ARMA (p,q) can be applied to model the time series (Shumway and Stoffer, 2017) and remove the autocorrelation from the data.  The analysis of the EACF and EPACF provided in Fig 3 reveals that there are significant correlations in the time series and therefore the classical Shewhart control charts could not be applied directly. It also reveals that ACF decays as a damped sine wave to zero and the PACF cuts off, both after lag 2. As a result, the time series under analysis can reasonably be modeled using an ARIMA (2,0,0) or simply a AR(2) (see the rationale in Table 3Error! Reference source not found.). The estimators of the AR(2) parameters were obtained from Statistica, and their values are provided in Table 8.   The parameters provided in Table 8 are used to determine the expected values of X, at each time t, which are then used to compute the residuals used to fit the AR(2) model identified previously.

ARIMA model fitting
After modeling the process, the identified ARIMA model must be fitted. As explained in Section 3.3, real-time series usually include noisy data and outliers. These outliers must be removed from the time series since they do not belong to the distribution that characterizes the time series. A standard approach to find out these outliers and/or any other unusual patterns is to apply control charts. Since the time series under analysis exhibits significant autocorrelation, classical Shewhart control charts cannot be directly applied, as concluded in Section 4.2. In this situation, control charts for residuals should be built, provided that they are independently and normally distributed, that is, they are white noise. The corresponding 260 residuals where obtained from Statistica, and have been computed based on the AR(2) model identified in the first iteration. With these values, we built the X and MR charts for residuals, presented in  An analysis of the patterns of the X and MR charts provided in Fig 4 reveals the occurrence of some special causes of variation in both the X and MR charts. These out of control points indicate that the parameters ∅ and ∅ had changed, implying that the quality variable X was out of control. As a result, the process was out of control. As explained in Section 3.2, these out of control points have been replaced by their expected value, determined by the AR(2) model identified in Iteration 1. Table 9 summarizes the observations of residuals that indicate out of control situations, their values, and the observed and the expected values of the AVGHT characteristic. After replacing the out of control observations provided in Table 9 by their expected values, a new time series is obtained. As a result, a second iteration is needed to identify the ARIMA model that best fits the new time series. Following the same steps carried out in Section 4.2, we identified the new ARIMA model, which is still an AR (2)  An analysis of the patterns in the X and MR charts provided in Fig 5 reveals that both charts still exhibit out of control points. Therefore, we performed a third iteration, in which we obtained the residuals plotted in the X and MR charts, provided in Fig 6. Analyzing the patterns of the charts provided in Fig 6, we can conclude that the process is under statistical control. Once this requirement is achieved, the model that best fits the time series is still an AR (2) residuals must be independently and normally distributed, with mean zero and constant variance . Analyzing the X chart provided in Fig 6, it is possible to confirm that the average value of the residuals is approximately zero. Note that the average value of the residuals corresponds to the center line of the X chart. The variance of the residuals is estimated using the estimator ´, as follow: = ´ = 13.554 1.128 = 12.016 ≤ = 144.384 where the value of the constant = 1.128 is obtained from the table of constants for control charts for variables (see (Montgomery, 2020), Appendix VI), considering = 2, which is the number of consecutive observations used to determine each .
To test the independence of the residuals from the fitted AR(2) model, we designed the correlograms of the EACF and the EPACF, shown in Fig 8. The analysis of the correlograms presented in Fig 8 indicates that Lag 9 is slightly outside the confidence interval, in the ACF and the PACF. However, we consider that these autocorrelations imply an error that is not significant. Therefore, the hypothesis of residuals independence has not been rejected. Therefore, we consider that the independence assumption is reasonably satisfied. To test the normality of the residuals' distribution, we applied the Kolmogorov-Smirnov (K-S) test. Analyzing the histogram in Fig 9, we find that the K-S test statistic value is = 0,02828.
As the total number of observations = 260 is greater than 30 (see (Lilliefors, 1967) ), considering a significance level = 5%, the critical value of the statistic is = . √ = 0.0549. As < , the null hypothesis is not rejected, concluding that there are no evidences that the residuals are not normally distributed.  Since the requirements for process stability, residual independence and residual normality, are satisfied, we are able to state that the fitted AR(2) model is valid and its parameters are provided in Table 10. where the value of the parameter to determine the process mean, = 40.543, was computed based on the ( ) equation for an AR(p) model, provided in Table 5Error! Reference source not found.. Using the mathematical model provided above, we computed the predicted values for the next 40 observations on the AVGHT, taking into account and , the last two observations from the sample of 260 observations used to fit the ARIMA model. Having on hand these predicted values, we computed the prediction errors (e) and proceeded to process monitoring, Phase II (see following Section).

Monitoring
In this phase, we applied the Shewhart control charts to the prediction errors, namely the e and MR charts, as explained in Section 3.2. Table 11 provides the real values of the next 40 observations used to compute the prediction errors. To monitor these prediction errors provided in Table 11, we computed the control limits, using the equations for the e and MR charts, provided in Table 5Error! Reference source not found.. The standard deviation of the prediction errors, , is computed using Equation 10, which led us to determine the coefficients of the polynomial ( ) = + + + ⋯ As the AR(2) model is defined by ( ) = and, considering that = ( ) , we obtain ( ) ( ) = 1. Developing this equation, we obtain The values of the coefficients are then, given by Having determined the coefficients and the , it is possible to compute the control limits of the e and MR charts, applying the corresponding equations provided in Table 5Error! Reference source not found.. The values of these control limits, the coefficients , and , at each time t, are summarized in Table 12. Using the data provided in Table 12, we built the control chart shown in Fig 10, which will be used to monitor the prediction errors provided in Table 12.

Discussion
The analysis of the patterns of the control chart illustrated in Fig 10 reveals the existence of some special causes of variation. However, apart from observation 269, the out of control points under the LCL suggest that the performance of the call center has improved. Prediction errors are computed from the difference between the observed value and the predicted value for a given time t, implying that, in the context of this study, prediction error is a "small-is-better" characteristic. In other words, the lower the value of prediction error, the higher is the performance. According to this rationale, the analysis of the chart provided in Fig 10 suggests that something desirable has happen at time 268,270,272,273,274 and 287, which resulted in observed values that are significantly less than the expected ones (short Hold Time is desirable to increase the customer satisfaction). From this perspective, the appropriate approach that the call center manager should adopt is to investigate the root causes of such performance improvement and try to replicate them in the future, in order to achieve higher performance. On the other hand, the data point over the UCL suggests that the observed Hold Time at time 269 was significantly higher than the expected value, which is undesirable. In that situation, the call center manager should also investigate the root causes of such bad performance and take corrective actions in order to improve performance and bring the process to an in-control condition. This may have been the approach that was actually adopted by the call center manager. In fact, the run from time 270 to 274 suggests that after the occurrence of an unacceptable value at time 269, effective action was taken that resulted in a significant decrease in the Hold Time and consequently improved performance. If this is, in fact, what happened, such improvement action should be implemented for similar situations in the future.

Conclusions
This paper presents a methodology for modeling and controlling the performance of call centers, which integrates ARIMA modeling and SPC. Process modeling through ARIMA models, in addition to being able to model the pattern of a certain characteristic of a process, in conjunction with SPC make it possible to monitor of the random behavior (residual) of that characteristic. In this example the characteristic is AVGHT. The integration of ARIMA and SPC is an approach that has not been widely used, particularly in processes where autocorrelated data are generated, as is the case for call centers. However, as suggested by the case study results reported here, the joint approach ARIMA/SPC provides a set of effective tools that can be used to model autocorrelated processes and monitor their behavior, and support decision making in the daily operations of call centers. The control chart for prediction errors proves to be an effective way to routinely monitor the evolution of the AVGHT characteristic and to detect pattern changes due to abnormal situations. Another important conclusion is that, for the sample data used in Phase II, the call center under analysis performed very well. However, the monitoring of the AVGHT characteristic must be an ongoing activity in order to test if this good performance was something temporary or if it is standard. This ongoing activity is also necessary to identify any undesirable sources of pattern changes and continuously decrease the AVGHT.
Although the proposed methodology is intended to forecast and control call center KPIs, it can easily be extrapolated to other industries where autocorrelated data are generated (e.g. chemical industry). As mentioned in Section 1, nowadays a massive amount of data has been generated in most industries. Therefore, effective tools are needed to extract knowledge from those massive datasets. However, there are also some difficulties that may difficulty its implementation, namely: In terms of relevance to theory, this is the first time that the joint approach ARIMA/SPC has been used to model call center performance. The main contribution of this joint approach is that it incorporates the monitoring phase (Phase II -see Section 4.4), something that has not been done in previous studies. In addition, this paper provides detailed guidelines on how to model processes where autocorrelated data are generated, how to identify the model that best fits the data, and how to compute and monitor prediction errors. With regard to the practical contribution, the proposed methodology is intended to inform policies and decision-making in call centers, i.e. to monitor the performance of their agents and quickly detect pattern changes due to abnormal situations. By quickly detecting such abnormal situations, managers can take prompt and appropriate action to correct the process and, at the same time, prevent the occurrence of such undesirable situations in the future. On the other hand, when a pattern change has a positive impact on the performance of the call center (e.g. decreasing trend in the AVGHT), managers can quickly investigate the reasons and try to replicate them in the future. This will help to improve the performance of call centers and increase customer satisfaction.
As is known, nowadays companies are recording data related to every activity and consequently producing massive amounts of data. One of the biggest challenges is the possibility of having autocorrelation in the dataset. The methodology proposed in this paper can be a valuable tool, not only to deal with such potential autocorrelation in the dataset but also to effectively monitor and quickly detect the changing pattern behaviors. This will support managers in predicting critical situations before happening and take appropriate actions to maintain the desired level of process performance.
Despite its potential theoretical and managerial contributions, this paper has an important limitation. Namely, a single case study was conducted and only one call center KPI was modeled. For these reasons, the conclusions may not generalized. Future research should apply and test the proposed methodology in more call centers and other services/industries where autocorrelated data are common to improve the generalization. In such future studies, more KPIs must also be modeled.