4.1: CUSUM
The CUSUM algorithm was tested on all other energy levels available and did not show any obvious trends observed before intervention was required. Results suggest that Medical Physicists intervention would be required once the dose change was close to 1.6%, despite the tolerance of the DQA3 machine being 3%.
Figure 5 shows time series data for 6X. The alarms shown are for when tolerance threshold was exceeded, set to 2% for this example. After an alarm was set, there was a reset in the cumulative sum that was monitored noted by the start button. The y-axis shows the amplitude of the dose tolerance
Figure 6 shows the cumulative sum of all positive and negative changes in the data which depicted separately. The x-axis shows points labelled in order dating from the start of January 2014 to the end of December 2019. The y-axis shows the change between data points on a daily basis.
The CUSUM model required thresholds to be manually determined, then retrospectively analysed to determine its accuracy, defeating the purpose to develop an algorithm able to provide a prediction based on the training dataset. CUSUM does have properties making it ideal for its use in change detection however its application in our final model that was limited. The algorithm shows promise as a graphical tool to help gain insight into the type of trends seen in the changes of dose tolerance levels.
CUSUM has been used in radiotherapy statistical control processes with moderate success elsewhere [6] [12] [13], and although it didn’t meet the primary goal of this study, it has been shown the potential to be a valuable tool in assisting physicists in a radiotherapy department if developed further, namely helping to strike balance between two competing mistakes made when allocating time to quality assurance procedures being acting when a problem does not exist and 2: not acting when a problem exists. [12] A potential procedure for using CUSUM in daily QA management is detailed in Figure 7 adapted from[12]
4.2: Linear Extrapolation
The sinusoidal nature of daily outputs over time, as discussed previously, means a linear representation as a standalone feature will never easily be able to account for output drift changes on a daily basis, particularly when there is a continual change in the direction of the output shift (being from positive to negative difference from nominal). Simple linear modelling was not an ideal tool for predicting daily output trends on a linear accelerator, one such example of where failure occurs is illustrated in Figure 8 which shows a linear forecast, determined from a sample of 10 points, for 15MeV electrons. The forecast predicts a value of approximately +1% for the 5-05-2015, if the trend calculated from 14-04-2015 to 28-04-2015 was linear.
The actual output data is presented in Figure 9 which shows some initial agreement with the linear trend prediction, however there is a significant decrease (>2%) in output from 4/05/2015 to 5/05/2015. In this instance there was a significant issue with the machine requiring intervention due to breakdown, which would be unable to have been forecast using simple linear regression methods. Unpredictable events such as this lead to substantial under-fitting using linear extrapolation.
3.3 Logistic Regression and Support Vector Machines
The use of a logistic regression algorithm requires independence between data points, which meant that when training the model, our data was not treated as a time-series dataset, leading to errors in its predicted value. There was significant imbalance in the classes used since the data rarely requires any ‘reset’ points. For data in each different energy level, less than 1% of the data points were ‘reset’ points which meant that the model nearly always predicted that there was no reset required from the validation sets used. This bias introduced in the training set meant the model always predicted zero, which also resulted in a false high accuracy as displayed in the confusion matrices below:
Figure 10 shows the confusion matrix when the class weighting is not changed. The class weighting parameter penalises mistakes in samples in order to put more emphasis on a particular class. In this case, there is a much higher frequency of ‘0’ when there is not a ‘reset’ for a data point compared to a ‘1’ when there is a ‘reset’.
The confusion matrix in Figure 11 shows when the class weighting has been changed to be more balanced, meaning that the smaller class, which is the ‘reset’ (or ‘1’) class, is replicated until there are as many samples as there are in the larger class in an implicit manner. When the weighting class is more balanced, it results in more false negatives, meaning that assigning more importance to the lower smaller class results in less accuracy. IN both cases the true negatives, ie true predictions are relatively high, which in turn is due to the bias inherent in the training data.
The SVM developed for binary classification yielded very similar results to Logistic Regression since there was a large imbalance in the classes being predicted. Different training-test splits were used to check if there would be any improvements in the accuracy for both models; however this did not lead to any further promising results.
There was limited success using either algorithm but using them helped the authors recognise that any future modelling must be able to support time-series data.
4.4 ANN: LSTM Neural Network
The use of an LSTM neural network algorithm provided the best results from the research conducted, which can be seen from the graph in Figure 12 which shows that the predicted values follow the trend of the validation set very closely even though there is slight under fitting in the model. This model used 200 epochs, and this means that the entire training dataset is passed forward and backwards through the neural network 200 times. As the number of epochs for the algorithm increased, the root mean square error (RMSE) would exponentially decrease. It was found that the RMSE decreased at a much lower rate at approximately 200 epochs, and there was no point in running it for any longer.
The root mean square error is a measure of the difference between the predicted value and the actual value in the dataset, and it should be minimised as much as possible.
One of the main drawbacks of the use of the LSTM neural network is that due to its complexity and the number of epochs that are run, there is a high computational training time required for the algorithm to build a model. In the future, if re-training is required of the dataset, there must be at least an hour dedicated to doing so, which is how long it initially took to train the model. The forecast in Figure 12 shows a plausible result for the prediction over the next 30 days however, further ongoing testing will need to be performed to check the accuracy of this prediction.