Comparative Analysis of Machine Learning Methods for Non-Intrusive Indoor Occupancy Detection and Estimation

Occupancy-driven application research has been active research for a decade that focuses on improving or replacing new building infrastructure to improve building energy eﬃciency. Existing approaches for HVAC energy saving are putting more emphasis on occupancy detection, estimation, and localization to trade-oﬀ between energy consumption and thermal comfort satisfaction. In a non-intrusive approach, various sensors, actuators, and analytic data methods are commonly used to process data from occupant surroundings and trigger appropriate action to achieve the task. However, the performance of the non-intrusive approach reported in the literature is relatively poor due to the lack of quality of dataset used in model training and expropriate choice of machine learning model. This study proposed a non-intrusive approach that to improve the collection and quality of dataset using data pre-processing. The study collected a training dataset using various sensors installed in the building and developed a model using ﬁve machine learning models to determine occupant’s presence and estimate their number in the building. The proposed solution is tested in the living room with a prototype system integrated with various sensors designed to obtain occupant surrounding environmental datasets. The model’s prediction results obtained indicate that it is possible for the proposed solution to obtain data, process, and predict the occupant number with high accuracy (73.6 - 99.7% using random forest).


Introduction
Indoor occupancy detection and estimation play a significant role in improving the building infrastructure such as smart buildings, indoor intrusion detection, evacuation, building operation, and demand control application (DCA) [1]. DCA provides a demand-driven feature that requires essential occupancy information to manage and enhance electric appliance's energy consumption. The demand control ventilation (DCV) is one of the areas of DCA that has gained increased research attention recently in building energy efficiency to balance energy consumption with thermal comfort requirements. Studies in [2,3,4] DCV can save up to 60% of the building energy waste and improve indoor thermal comfort. Recently researchers have applied different technologies alongside machine learning (ML) methods to obtain occupancy data to improve DCV [5,6,7]. These technologies include camera-based [3,8,9] and wearable [10], which are currently deployed in commercial and residential buildings. The decline or even discontinuation of adaptation of these technologies is forecasted in future smart buildings due to privacy concerns [3,11,12]. Consequently, a non-intrusive technique using indoor environmental (IE) sensing was introduced in [13,8,14,15,16] to measure the level of indoor variables, including temperature, humidity, and light intensity, to deduce the occupancy status and number. Studies in [17,18] demonstrate how occupancy parameters can be used to finetune the energy consumption through the " energy use per person" to establish a stochastic parameter correlation between occupancy and energy usage. The results show 8.9% in the classroom, 3.1% in an office environment, and 1.3% in the computer room energy-saving potential. A similar study [18] uses occupancy driven thermostat in a residential building under three different settings (occupancy driven, based on schedule, and always on). The result showed the occupancy-driven approach could provide 11%-34% energy saving with a satisfactory indoor comfort level.
Although occupancy detection and estimation from environmental variables are less direct than alternative approaches like cameras or wearables [9], accurate modeling is possible with high reliability [3]. This approach is based on indoor variables variation measured, which is directly influenced by the number of occupants present [14,19]. Modeling occupancy prediction and estimation strategy can also be considered as binary and multi-class occupancy prediction problems, respectively [8].
While the use of passive technology is ruled out, building sectors research is currently looking into the strategy to improve the existing binary and multi-class occupancy prediction, which can be integrated into the HVAC thermostat to balance the energy consumption proportional to the room occupants [20]. The model must have adequate knowledge about the environment and specific actions expected to meet this challenge. Unlike previous models [8,13,14,19,21,22,23] that employed direct sensing of single or two variable parameters, the proposed model uses feature correlation derived from five independent variables for multi-occupancy prediction and single variable parameter (CO 2 ) to handle binary prediction problem alongside ML.
The proposed approach can be used to replace the wearable approach to eliminate requirements for the third-party device and its drawback and camera to ensure the privacy of the occupancy for the real-time occupancy prediction. The predicted occupancy number can be used to set the required optimum temperature to adjust the HVAC operation according to the number of occupancy in the building. The study makes the following contributions: • The study presents a comprehensive model for a novel environmental sensing occupancy prediction that combine variable correlations and ML for real-time occupancy prediction problem.
• The study also presents experimental results for both binary and multi-class occupancy prediction using five popular ML methods and compares it the similar existing methods through experiments.
The organization of the study is as follows: Section 2 reviews the literature related to indoor occupancy detection and estimation. Section 3 highlights the methodology of this study. Section 4 present the experimental work, including model development, testing, and results presentation. Section 5 presents a discussion of the findings and compares the experimental results with the existing literature, and finally, Section 6 provides conclusions of the study.

Literature Review
Integration of occupancy detection and estimation features in a control system is essential to support and exercise the DCV. The study adopted a similar occupancy prediction classification used in [3] to categorizes reviewed occupancy prediction approaches using various technologies as described in Table 1.  [13,14,15,16,21,24,25,26,27,28,29,30,31,32,33] Prone to a false alarm, cannot provide additional occupancy information Occupancy Count/estimation Refers to the how many or level of the occupants in the space Camera [3,8,9,34,35,36] Process The environmental variables sensing approach is proposed in [13,14,15,16,21,24,25,26,27,28,29,30,31,32,33] to predict room occupation by measuring the variation of indoor parameters. The sensor modality adopted in these studies can be extended in many indoor sensing applications, including multi-sensing technology to observe concentrations of volatile organic compounds in the air, wireless sensor network technology for monitoring indoor air quality, and smart climate technology for weather forecast. These solutions have proven the capability to perform indoor occupancy detection and provide information on historical indoor occupancy schedules at minute level schedules.
Camera-based (infrared and optical cameras) are used in [3,8,9,34,35,36] alongside machine learning to carefully analyze capture image frames for occupancy detection and estimation in commercial and residential buildings. The fusion modalities are considered to differentiate human occupancy and other object emitting thermal heat in the environment and support night vision prediction. The camera-based approach can handle binary and multi-class occupancy predictions accurately, up to 96% and 26% energy saving potential. However, the approach has some drawbacks, including high cost and processing power, privacy concern, limited coverage perimeter, and prediction accuracy is not reliable in crowd scene or occupancy overlapping area.
Similarly, studies in [22,27,71,72,73,74] are among the early studies to present the occupancy estimation model that employed the ML method on indoor environmental variables. Other studies including [14,19,32,37,75] differ from earlier studies by considering variables correlation factor in pre-processing hyper-parameters, which can improve the model prediction final output. The finding in the literature indicates that the ML-based indoor environmental variable sensing approach can provide accuracy in the range of 73% -75% in an office environment. However, the accuracy reduces when the number of occupancies goes beyond four persons in the space.
Wearables approaches are proposed in [10,24,46,51,52,53,54,55,56,57,58,59] to obtain occupancy information as a product of tasks completed by other systems which can be used to track the occupancy location. ML model can obtain signal intensity from statically positioned beacons in a target space to obtain a fine-grained occupant location and achieve the location accuracy of five meters. This approach suffered hardware limitations, including privacy concerns per-person hardware installation scale.
Activation of specific sensors with established positions has previously been used in passive infrared [60,61,62,63,64,65,66], acoustic [46,76], and WiFi signal [27,46,47,48,49,50] to obtain occupancy and location details using a heterogeneous sensing network. In these studies, a multimodal data fusion and deep learning method were employed to estimate occupancy. Passive infrared motion detection is commonly used to detect indoor activities but is incapable of differentiating between human and non-human occupancy. The acoustic estimation algorithm for people counting is prone to nearby noise even with background sound cancellation, but the algorithm can be improved to extend the model to large-scale scenarios with unlabelled acoustic signals using deep nets to assimilate the amount of location-specific gathering. Wi-Fi signal model-specific occupancy activities can enhance occupancy detection with high privacy concerns using strategies for pre-processing and encoding sensor data sources, but their impacts across models can slow the prediction model.

Methodology
Dataset Acquisition and Selection Process Datasets used are collected in residential building settings in a living room in a house consisting of five different rooms located at the Taman Teratai Johor, Malaysia, which has a tropical climate year-round with average temperatures ranging between 25 • C to 30 • C throughout the year. The living room is being designed for occupant's gatherings activities such as resting, eatery, watching TV, and other social gatherings. Sensors (see Table 3) are installed in the on ceiling in area to monitor indoor environmental qualities such as temperature, light illuminance, relative humidity, CO 2 concentration. In addition, occupants' entrance and exit details are manually recorded in the living room to ensure occupant's numbers tally with sensors readings. The dataset collection lasted from April 1 st , 2021, to April 28 th , 2021, using continuous readings. The only dataset with full-day readings and more than three streams' columns in a row is considered. Additionally, records are swapped to avoid revealing occupancy schedules when datasets are published, as reported in [19] that CO 2 concentration can be deanonymized for susceptible privacy attacks. For odd days (Sunday, Tuesday, and Thursday), the two consecutive rows' streams are randomly swapped, while for the even days (Saturday, Monday, and Wednesday), the first two rows' streams are swapped sequentially. Even though it is not considered in a recent study [19], the study decided to introduce and compute the humidity ratio from the original dataset stream to improve occupancy estimation accuracy.
Dataset Pre-Processing According to the central limit theorem, dataset pre-processing is essential to check the normality of the dataset to ensure it does not contain outliers, affecting the overall performance of the prediction model [14]. Even though it is reported that if the observation of the dataset sample is 100 or more, violation of the normality is not a critical problem [19]. However, regardless of sample size, the assumption of normality should be adopted for meaningful conclusions. Statistical summary (see Table 2) and Q-Q plot (see Figure 1 and 2) dataset normality check techniques are conducted before making conclusions about the dataset's normality [77,78].

Normality test
The statistical summary (see Table 3) approach expresses the dataset normality characteristics inform statistical terms such as the mean and standard deviation, skewness, and kurtosis.
The statistical summary of time streams consisting of 2668 readings on five variables parameters (Date, Temperature, Humidity, Light, CO 2 , Humidity Ratio, and Occupancy) is presented in Table 4. The standardized skewness and standardized kurtosis, determining whether the sample comes from a normal distribution. However, the standardized skewness and kurtosis values of the results are within the range of -2 to +2, indicating significant departures from normality, which would tend to invalidate the assumption of normally distributed data theory. Even though the statistical summary provides impartial judgment of dataset normality, it may be insensitive to small dataset sample sizes or too cautious at large dataset sizes.
Our dataset is not smaller in size (contain over 2,000 records), the parametric test using graphical Q-Q Plot is conducted (see Figure 1 and 2). Graphical analysis has the advantage of encouraging judgment to measure normality in cases where a statistical summary test can be overly or underly sensitive.
Although, the graphical representation for assessing normality requires a great deal of expertise to prevent incorrect interpretations. The data for graphic interpretation is usually presented in histograms or Y and X vectors. According to [79], suppose Y is the variable that depends on the regression matrix of variables X. If . . x n ) are jointly normal, then Y is said to be conditionally on X and µ = f (X) is normally distributed vector. Therefore Y and µ can be expressed as: The graphical presentation of normality distribution of the sample dataset is conducted using the Q-Q plot (see Figure 1 and 2).
[width=14.07cm,height=8.36cm]media/image1.eps The analysis indicates the dataset points does not fully follow normal distribution consist of little variance, requiring data analysis at this stage to achieve a Gaussian distribution. After manual inspection of the unfitted points, it was concluded that the skew is not caused by inaccurate sensor readings or recordings but is spontaneously created and is not inherently a concern and cannot affect the model prediction results. The distributions of unfitted points appear in all variables, with more extreme values in the CO 2 and occupancy variables. According to several experiments, about 1 in 340 observations in a regular distribution would be at least three standard deviations apart from the mean [80]. However, in smaller datasets, random chance can contain extreme values. In other words, producing odd values naturally is routine, and there is nothing wrong with these data points. Thus, even though they are rare, they are a natural part of the data distribution.

Computing variable feature correlation
Variable feature correlation is critical for model feature selection, which can enhance the model prediction performance. Feature correlation is assessed based on the dependency relationship of the predicting variable on predictors. Figure 3 provides data visualization to assess the distribution of the indoor occupancy variable (predicting variable) in relationship with other indoor variables (predictors) during the period of room occupation. The Figure 3 indicates all variable has a strong correlation with room occupation especially CO 2 and humidity which can be seen in Figure 3. However, the value of correlation significant between occupancy and other predicting variables cannot be readily determined from Figure 3.
[width=396pt,height=3in]media/image3.eps This study uses Pearson's Product-Moment Coefficient (PPMC) metric for generating a correlation coefficient value. PPMC measure the strength of dependency between the variables x and y when given a set of paired (x,y) values between -1 and +1 [14,30]. Figure 4 presents the computed PPMC values using six variable parameters with values vary from -1 to 1. 1 indicating a heavy positive correlation label shaded with white background color, followed by 0.9 shaded with red background color and so forth to 0.00 and -0.00 shaded with a green background color indicating a weak correlation between the variables. Predictors that are not correlated with predicting variable at all variable or with weak correlation values are most likely candidates to remove from the model using variable permutation importance measure known as feature selection. Furthermore, it is recommended that if two variables are highly correlated, only one of them should be considered to simplified models, and simpler models are easier to understand. [width=10.63cm,height=7.35cm]media/image4.eps

Variable Feature Selection
Feature engineering is essential in developing ML models, which required removing features with weak correlation before deploying the dataset sample into the model for evaluation. A variable importance measure metric in [79] is considered to remove uncorrelated variables parameters. The theory in [79] suggest for predicting variable Y and predictors X = (X 1 , . . . , X p ) be a vector of random variables. The rulef in regression setting for predicting variable Y is a function that can be measure using the values in R. the prediction error off can be defined by ., X ip ). Since the true prediction error off is unknown in practice, observation of a test dataset (D) is considered for prediction and thereforeD can finally be presented as: Permutation variable importance is a model inspection technique in [77] have shown proficiency in non-linear estimators like our model and therefore adopted in this study. The technique considered predictors X i X j as the critical predicting Y from (see equation 2). If the link between the feature X i X j and Y is broken, the increase in prediction error score may be observed. The score value in the model reflects how much the model is dependent on the feature. This methodology has the advantage of being model agnostic, allowing it to be measured several times with various function permutations. To demonstrate this model, [77] randomly permute the observations of the X i X j 's.
Formalizing the statistical permutation value calculation is as follows: define a group of out-of-bag samples {D t n = D n \D t n , t = 1, . . . , n tree }. Let {D tj n , t = 1, . . . , n tree } represent permuted out-of-bag samples by randomized permutations of the j − th variable's values in each out-of-bag subset. The variable X ′ j s statistical permutation value is defined as: This quantity is the statistical equivalent of the permutation importance measure I(X j ) recently formalized by Zhu [81]. Let (X j )= (X 1 , . . . , X ′ j , . . . , X p ) be the random vector such that X ′ j is an independent replicate of X j that is also independent of Y and all other predictors, and the permutation significance measure is provided by: In the expression of I(X j ), the permutation values of X j mimics the identical and independent duplicate of the distribution of (X j ) in I(X j ) . Thus equation 4 can compute the correlation index value of predicting variable and independent variable as presented in Table 4. The predictor's correlation index in relation to predicting variable is computed and displayed in Table 4 to simplify identifying and removing predictors with weak correlation values. It is indicated that, the variable predictor Date demonstrates a weak correlation index and therefore removed from the original dataset. The remainder of the variables can feed the model to train the machine learning model and measure its accuracy against the test dataset.

Experimental Work
During the model training, typically, datasets are split in the form of the train and test ratio when ML algorithms are employed to make predictions on data to measure their performance. The technique is straightforward and quick for assessing model prediction performance on various ML methods and chooses among the optimal methods that fit the model prediction problem. The technique entails shuffling and splitting the original dataset into training and test in a ratio, for example, 70:30 see ( Figure 5). The first portion, known as the training dataset, is used to match the model. The second portion, known as the test dataset, is used as input to variables dataset to feed the model to test prediction and measure the prediction outcomes. [width=6.51cm,height=5.90cm]media/image5.eps

Candidate Model
Five candidate ML methods have been chosen for inquiry to further explain their performance in ML for both binary and [17] multi-class-occupancy prediction problems. These models are less complex than many of the more recent developments in this field, but they are well known, acting as performance baselines regularly. Another advantage of these methods is that they are foundational options for many additional applications apart from occupancy detection and estimation and, as such, are well served by ML libraries. Both implementations in this work use the scikitlearn Python library, and details about default algorithm settings can be found in the library documentation [82]. The remainder of this section offers a detailed overview of the chosen ML methods and their result prediction on both binary and multi-class occupancy prediction problems.

Random forest
Random Forests (RF) are a collection of various decision trees that are [17] applied sequentially from a root (parent) node to a terminal (or child) node to predict the behavior described by trained data [77]. [82] This technique provides several conditional rules that can be as easy as comparing a sensor reading to a threshold to match data samples by related traits. Each decision tree employs bootstrap sampling, also known as bagging [78], which essentially use two-thirds of the training samples for prediction and the remainder for evaluation of prediction accuracy for both deep or very deep trees. This implies each tree in RF is working against the same target but is given separate portions of the training data to learn from. The outcomes from all the trees are added together to generate the final results. These rules influence how the models handle bias and uncertainty in their forecasts. The number of decision trees that the model can use to match the data is generated from multiple regressions and recursive splitting from the dataset during analysis [83]. For binary prediction, a single predictor variable parameter is used (CO 2 ) to predict room occupation status, and the result of the prediction using RF is presented in Table 5. As can be seen in Table 5, the RF classifier is evaluated to verify its performance prediction on new data. This is because, in many cases, the ML classifiers can perform well when tested with the original training dataset and performed differently with a new dataset. Therefore, the scoring bin in Table 5 holds the dataset record splitted into a training and a testing dataset. The accuracy of the binary prediction performance ranges from 58.3% to 99.6% for accuracy, 73.6% to 99.7% for F1 score, 58.3% to 99.9% using precision, 97.8% to 100% recall.

Naive bayes classification
One of the most powerful and effective classification algorithms is Naive Bayesian classification (NBC). The algorithm is based on the Bayesian Theorem of probability first proposed by Reverend Thomas Bayesian [84,85]. The theorem states that a hypothesis's likelihood is a function of recent facts and prior knowledge. It is a way of figuring out how a new piece of proof affects the likelihood that a hypothesis is right. It has been used in a wide range of applications. In a real-world application, most machine learning techniques concentrate on learning in a continuous feature set.
Nevertheless, several classification tasks include continuous features, which cannot be solved without first discretizing the continuous features. Naive classifier provides advantages of easy to construct, requiring very little domain expertise, compared to general Bayesian networks, which may necessitate several extensive sessions with expertise to create the true dependency structure across functions.
In addition, the idea for variable discretization can optimize the time and space constraints that significantly improve the induction algorithm's performance. The NBC binary occupancy prediction using CO 2 data is presented in Table 6. As shown in Table 5, the RF classifier performed slightly better than the NBC classifier (see Table 6) with performance results ranging from 58.3% to 99.1% for accuracy, 73.6% to 99.2% for F1 score, 58.3% to 99.9% for accuracy % using precision, and 87.4% to 100% for recall.

Support vector machine
The Support Vector Machine (SVM) algorithm does not require the same assumptions as the LDA model to make predictions. This approach operates by locating the boundary that maximizes the difference between the groups to be divided, which is always achieved in a high-dimensional space. The boundary is discovered by fitting the data samples with a chosen kernel function, which informs the relationship of neighboring data. Kernels with examples include linear, polynomial, sigmoid, and radial basis functions. In this approach, the kernel will be the radial basis function.
This approach uses only the data samples nearest to the edge, which has the advantage of not needing it to cover the entire dataset to make decisions. When a data sample is extended to a high-dimensional feature space, it is believed to have more excellent separability, making it ideal for SVM to achieve high performance. Table   6 presents model prediction performance on data samples when using SVM. Data presented in Table 7 indicate SVM classifier underperformed compared with RF and NBC classifiers with performance results ranging from 58.3% to 86.7 % for accuracy, 73.6% to 87.7 % for F1 score, 58.3% to 99.9% using precision, 72% to 100% recall.

Artificial neural networks
Artificial Neural Networks (ANNs) are biologically based structures design for modeling problem estimation in which a range of variables is predicted using sample data during training. A series of dependent and independent variables are used to learn the model responsible for data in the neural net scheme. These networks are composed of individual neurons. The weights of connections between neurons are normally calculated using specific learning rules. The dataset was used to evaluate a neural net with two hidden layers, each with the same mixture of neuron numbers. The backpropagation algorithm is used to understand, and the network error is propagated backward from the output layer to the input layer. Data is processed simply inside the network's layers, and the weights of each neuron are changed to decrease the mean-squared error between the variables t and the target based on a given precision index or after a given set of iterative learning processes are completed. The ANN model is used to forecast final output from previously unknown input data after it has been adequately learned and evaluated. The result of ANN analysis on binary occupancy prediction is presented in Table 8. Similarly, ANN classifier (see Table 8) performed better than NV, and SVM with performance results ranges from 58.3% to 99.5 % for accuracy, 73.6% to 99.6 % for F1 score, 58.3% to 99.9% for precision, 95.3 % to 100% recall.

Logistic regression
Logistic regression predicts a dependent variable in logistic settings with a dependent variable with two potential values output and one or various independent variables. The independent variables are evaluated using the dataset, typically using a maximum-likelihood calculation to decide which is appropriate in predicting depending on the variable. Potential model complexity in logistic regression is low when there are no or only a few interaction terms and variable transformations are utilized. In this scenario, overfitting is less of a problem. Variable selection is a method of reducing a model's variability and, as a result, the possibility of overfitting, but it may also minimize the model's versatility. The result analysis of LR for binary occupancy prediction is presented in Table 9. Lastly, LR classifier results presented in Table 9 show it performed low prediction in comparison with RF, NBC, and ANN classifiers but outperformed SVM classifier prediction with performance results ranges from 58.3% to 96.6 % for accuracy, 73.6% to 97.1 % for F1 score, 58.3% to 99.9% for precision, 67 % to 100% recall.

Model validation
This section deals with the multi-class occupancy estimation problem using five mentioned ML methods described in section 4.1, and their results performance analysis is presented in Table 10, unlike binary occupancy prediction that uses single variable parameters (CO 2 ) to predict whether the room is occupied or not. The multi-class occupancy estimation classifier uses five variable parameters to estimate the number of occupants present in the room to ensure the model produces reliable results on a new dataset. The validation and results compassion is essential to decide and choose which method is good enough to solve the multi-class occupancy estimation problem. Typically, the accuracy metric alone cannot provide enough information for this decision, and therefore, other members of metrics are considered as described in this section.

Result Interpretation
Performance analysis using a confusion matrix A confusion matrix is mainly used to illustrate the prediction performance of the ML classifier on a sample dataset with unknown actual values. This approach is relatively straightforward to understand See (Table 11). Note: In the notations, TPR and TNR represent correct model prediction while FPR and FNR represent incorrect model prediction.
The graphical representation (see Figure 6) depicts five algorithms that were evaluated on the model with different discrimination thresholds. At different threshold conditions, the TPR, TNR, FNR, and FPR are plotted in the Figure. In ML, the TPR and TNR are also known as the likelihood of positive detection, and FNR and FPR are known as the likelihood of false alarm. The confusion matrix analysis provides tools to select possible best prediction models and eliminate less optimal prediction independently from (and before specifying) the cost context or the class distribution. This analysis is related to a direct and natural statistical performance measure in detection/classification theory and hypothesis testing and like the way of doing cost/benefit analysis of diagnostic decision making.
[width=295.2pt,height=223.2pt]media/image6.eps The trained model prediction performance comparison on five ML algorithms demonstrated good excellent prediction reliability except for SVM with higher FPR and FNR (see Figure 6). It is important to mention that the RF, NBC and ANN, LR, and SVM performance are interestingly improved compared with trained RF models in [14]. The clustered training and prediction significantly reduced the FPR for all models and offered a more reliable TPR for occupancy estimation. Out of actual correct prediction, the proposed model provides prediction

Accuracy prediction
Accuracy metrics used to evaluate the percentage correctness of the model prediction. It is defined as the percentage ratio of several correct predictions over the total number of predictions. Using the confusion metric, the accuracy can be calculated using the following equation.

Precision & Recall
In ML classification, the prediction accuracy metric is used to assess the model level of prediction confidence. For example, the objective of our model is to estimate the occupancy number in the room at the state of room occupation or predict the room is not occupied at all. In these settings, if the model is not well-trained, we might end up with a model that always predicts the room is vacant with 99% confidence but 0% useful. However, with precision and recall values, we can be able to gain information and tell if something is wrong with our model. Precision ensures that there is no high miss estimation of occupancy number. Recall ensures the state of room vacation is not overlooked. We do not want our model to incorrectly predict a high level of room occupation, which increases the requirement for room ventilation for the HVAC system or false alarms for evacuation and safety management. At the same time, we do not want our model to predict the room is vacant, which can lead to discomfort situation in the HVAC system and false alarms for evacuation and safety management. Information in Table 10 shows all algorithms considered have achieved 99% correct prediction except for SVM with a precision score of 95%. In this case, RF, ANN, and NBC outperformed by scoring recall of 99% and 98%, respectively, followed by LR scoring recall of 92%. While SVM model recall prediction however performance suffered strongly by scoring 81%. Precision can be defined as the ratio of the total number of TPR to the total number of positive predictions (TPR+TNR), which can mathematically be presented as: Recall can be defined as the ratio of TPR to the total number of TPR and FNR which can mathematically present as: F-Score F-score ultimately, it is essential to have an overall metric to trade-off the precision and the recall model prediction by measuring a single grade value score. Therefore, it makes more sense to merge the precision and recall metrics; the standard approach for combining these metrics is known as the F-score or F-Measure (see equation 8). the result analysis of the F-score evaluation RF, ANN, and NBC models performed excellent prediction by scoring up to 99%, followed by LR with a score of 96% (see Table 10). In contrast, SVM is the least performing model with an F-measure score prediction of 87%. Thus, mathematically the F score can be presented as: Mean absolute error In ML predictor Mean Absolute Error (MAE) refers to the magnitude of the difference between the model prediction observation and the actual value of that observation which is calculated for the whole group. It is an easy way to understand the quantifiable measurement of errors for the model prediction problems and is often used to summarize and assess the quality of an ML model. Mathematically MAE can be presented as: For a given sample test dataset, the MAE of a prediction model is the mean of the absolute values of each prediction error over all instances of the test dataset. The error between the actual value and the predicted value for that instance is known as prediction error. Basically, MAE provides information on the capacity of an error to expect from the model forecasting on average. The MAE evaluations result in Table 10 show the RF has the least average forecasting error of 1.9%, followed by SVM and ANN of average of 9.6%, followed by NV with an average of 9.8%, and the LR model is expected to produce a higher average forecasting error of 10%.
Root mean square error Root Mean Square Error (RMSE) is also known as the root mean square variance. It uses Euclidean distance to demonstrate how far projections differ from observed true values. When it comes to ML, using a single value to judge a model's success is incredibly useful, whether it is during testing, cross-validation, or tracking after implementation. The RMSE scoring guideline is consistent with some and easy to understand by calculating the residual difference between prediction and ground truth for each data point (see equation 10). The model prediction analysis in Table  10 reveals that RF and LR outperformed with RMSE score values of 7% and 8%, respectively. While SVM, ANN, and NBC model RMSE score is 13%.
Where N is the number of data points, y(i) is the ith measurement, and y ( i) is its corresponding prediction. Note: RMSE is NOT scale-invariant, and hence comparison of models using this measure is affected by the scale of the data. For this reason, RMSE is commonly used over standardized data.

Relative squared error
Relative Squared Error (RSE) is used to evaluates model efficiency by comparing it to that of a basic predictor. The RSE splits the total squared error of the evaluated sample by the total squared error of the simple predictor to normalize the total squared error. The values range from 0 to infinite, with being 0 the best value. Mathematically, the RSE Ei of the model i can measure by equation 11. As shown in Table 9, RF and LR models achieved remarkable efficiency with RSE scores of 7% and 8.5%, while the remainder of the model's RSE is up to 13% (see Table 10).
where P(ij) is the predicted value by the model i for sample set j (out of n sets); Tj is the target value for record j; andT is given by the following equation: Relative absolute error Relative Absolute Error (RAE) is a metric for evaluating model prediction output in machine learning and other data processing applications. RAE is expressed as a ratio when a mean error (see equation 12) is opposed to errors produced by a negligible or naive model. A realistic model (producing better outcomes than a marginal model) will produce a ratio lower than one. Its value ranges from 0 to infinity, with 0 being the best value and values closest to 0 being better than higher values. The evaluation results in Table 10 show LR, NBC achieved best RAE sore of 1% followed by RF with a score of 2% and lastly SVM, and ANN reported RAE score of 11%.

Coefficient of determination
The coefficient of determination (R2) clarifies how much a model will perform when it comes to replicating observed results. It provides information on the probability of possible events occurring within the expected outcomes. The idea is that as more samples are added, the coefficient will reflect the likelihood of a new point falling along the line. It is the ratio of the dependent variable's variance that the independent variable can predict (see equation 13). The R2 values ranging frominfinity to 1 mean that values closest to 1 are preferable to 0. The results analysis in Table 10 indicates RF and LR have attained a higher R2 score prediction of 99%, followed by SVM and ANN with a score of 98% and NBC model with an R2 score of 95%.
Average log loss Average Log-Loss (see equation 14) use to evaluate the model prediction efficiency based on the likelihood of a record being categorized as class 1 and then assign the data point (a record) as one of two classes (1 or 0) depending on whether the probability exceeded a threshold value, generally set at 0.5 default. For efficient prediction model must first estimate the likelihood of the record being listed as class 1. Thus, the higher the log-loss ratio, the more the expected likelihood to differ from the actual value. The information presented in Table 10 shows RF, ANN and NBC have achieved the better average Log-loss with a score of 0.027, 0.04, and 0.068, respectively, followed by LR and SVM models with average Log-loss score of 0.17 and 0.28. where: i is the record / observation, y is the actual value, p is the probability prediction, and ln refers to the natural logarithm of a number.

Discussion
Research on applying ML methods for non-intrusive indoor binary and multi-class occupancy problems is gaining momentum, especially in smart building applications. Most of the existing studies on the subject are specifically designed to handle either binary or multi-class occupancy prediction, with only a few that can handle both prediction problems. Those studies that combined the two solutions used the ML method without proper variable feature selection. This is why the performance of their multi-class prediction problem tends to reduce when the number of occupants goes beyond 4 [21,32,86] and 0-20 occupants [8,23]. Table 12 presents a comparison of the binary prediction problem, and Table 12 represents the multiclass occupancy prediction problem.
Considering the current context of the existing studies, the proposed approach utilizes the historical occupancy data from sensors (CO 2 , occupancy numbers, and occupancy correlations with building environmental variables) through continuous occupancy monitoring and machine learning technique to develop concreate occupancy prediction models for the examined building areas. The developed models can be stored, retrieved later, and regularly updated. Moreover, the proposed approach can perform both binary and multi-class occupancy prediction, which has more advantage in comparison with the existing studies (see Table 12 and 13). Information presented in Table 12 indicates the CO 2 is the most common environmental variable employed for binary occupancy prediction [12,14,19,21,22,23,33,73]. It is also revealed that light intensity has the highest accuracy but might suffer high false alarm, especially for a building that absorbs solar lighting. The current CO 2 binary prediction capability ranges from 50% to 96 % accuracy (see Table 12). We carefully collect, label, and pre-process our dataset to avoid outliers and deployed it to our model that uses ML. Our model reports accuracy ranges of 58.3% to 98.7% using RF, 58.3% to 92.6% using NBC, 58.3 to 83.7% using SVM, 58.3% to 97.2% using ANN, and 58.3% to 80.7% using LR (see Table 10). Data in Table 13 shows that the existing multi-class occupancy prediction [8] is within the range of 24% t o85%. The approaches can handle a multi-class occupancy estimation to identify the exact indoor occupants' number using combine data from four indoor environmental parameters. In [8], a ML classifier was developed implementing LR, ANN, and SVM methods. The prediction accuracy of the proposed multi-class occupancy prediction is 24.43%, 24.90%, and 25.15% using LR, ANN, and SVM methods, respectively. The authors note that the lack of variables' correlation in their model is one of the major reasons that challenge multi-class model prediction accuracy. In comparison, the proposed model is designed with deep consideration of variables correlation to handle multi-class occupancy estimation problems. Our model performance in using similar methods is 96%, 98.9%, and 87% using LR, ANN, and SVM, respectively.
Similarly, the CO 2 -based multi-class occupancy estimation model is proposed in [21] to estimate occupancy numbers and optimize indoor thermal comfort and HVAC energy usage. The initial prediction on these models shows that ANN reports high accuracy using a single CO 2 dataset when the occupancies are not larger than four in the room. Later in [12], the models were optimized through extensive training using a dataset collected from four different buildings, and performance results show 94.4% and 73.76%, prediction accuracy respectively. Our approach handles multi-class problems through dependent parameters derived from the combined correlations among independent variables achieving 98.9% accuracy using the ANN method. [14] uses five independent variables to handle multi-class occupancy prediction using various ML methods in an office building environment. It is demonstrated that RF reported high prediction accuracy, and authors claimed to improve the previous prediction from 70% to 85% to 92% to 95%%. In comparison with this approach, our model achieved 99.35% accuracy using RF.
In [19], datasets for developing and evaluating multi-class occupancy estimation problems using statistical and machine learning approaches are presented. The model proposed in [19] uses five indoor environmental parameters for model training and testing deployed in three different rooms. RMSE value of 0.075 is reported as overall accuracy. However, more errors are reported as the number of people increases in the room. Thus, the authors use an interactive learning approach to exchange information with the users to collect ground truth data.

Conclusion
Occupancy detection and estimation features are essential in DCV to trade-off between energy consumption and thermal comfort. Several technologies have been researched, including cameras, wearables, indoor environmental sensing, and passive infrared sensors, to enable occupancy-driven application for building energy efficiency. Literature shows the environmental sensing approach has proficiency in overcoming hardware limitations (including privacy, scalability, and lack of focus) of some of the most commonly used technology (including camera, wearable, and passive infrared) and suitable for commercial and residential building environments. A low-cost non-intrusive occupancy prediction model that uses indoor environmental sensing and ML methods is proposed in this study. The proposed approach has solid proficiency prediction potential in many building domains, including providing enough occupancy information to fine-tune HVAC energy consumption. The proposed model uses data from five sensor streams installed in the living room for training and validation. The indoor environmental occupancy correlated data from five data sources using sensors stream installed in living room. The collected historic occupancy-related data is used for model training and testing. The model is evaluated using five popular ML methods, and their prediction performance was measured using different metrics. The model prediction performance varies across different ML methods, with RF outperformed, achieving an overall of 98.7% for binary prediction using only CO 2 variable parameters and 99.3% prediction accuracy for multi-class occupancy estimation. In contrast, the SVM method is outperformed by the other ML methods as its overall prediction accuracy is only 87.6%. Moreover, the results demonstrate that incorporating more variable parameters with a strong correlation with the ML method can help to improve occupancy prediction problems rather than using a single variable parameter or direct use of the data from sensors. Additionally, multivariable parameters or a complex model does not necessarily mean more prediction accuracy can be achieved. The results also confirmed that, with no exception, the proposed model tends to introduce error prediction as the number of the occupant in the room keep growing. It is observed that during the experiment, the level of CO 2 does not significantly increase for more than seven occupancies when the HVAC system is operating. This could be due to fresh air coming into the room to improve the indoor air quality. This problem needs further study and analysis to address in the future carefully.   Ratio of training and test dataset ML methods performance evaluation results using a confusion matrix