5.1 Using neural network model for weather prediction in ACARR dataset
A case study is undertaken utilising an atmospheric dataset to validate the hypothesis presented in Section 4.3. Dataset for the case analysis is collected from Advanced Centre for Atmospheric Radar Research (ACARR) located at Cochin University of Science and Technology (CUSAT), India. Advanced Centre for Atmospheric Radar Research (ACARR), Cochin University of Science and Technology (CUSAT) is the most modern and advanced ST wind profiler radar to better comprehend diverse troposphere circulation aspects and lower stratosphere, as well as their impact on the underlying environment the Indian subcontinent's monsoon circulation. For the current analysis, event rain is considered as outlier points. For the current case analysis base neural network obtained from simulation is used to classify the event rain.
After applying the base model for outlier detection domain specific adaptations can be done in this base model. Current section presents a case study using the basic features. For getting the basic features in the domain a literature study is conducted. Based on that, rainfall rate is dependent on the thermodynamic variables (Zawadzki et al. (1981). Demographic studies show that the rainfall over Cochin is influenced by both the coastal effect and the orographic effect due to its proximity to both the Sea and the hills. ACARR has most modern and advanced ST wind profiler radar to better comprehend diverse troposphere circulation aspects and lower stratosphere, as well as their impact on the underlying environment of the Indian subcontinent's monsoon circulation. The cloud database of the centre stores all the atmospheric features. Yet, studies show that temperature, humidity, wind speed, pressure and radiation are the relevant factors (Jaseena et. al., 2020; Mathew et al., 2021; Mohankumar et al., 2019) that affect the event rainfall. Because of that, these relevant features are taken for the current analysis. As discussed, the outlier represents the event ‘Rain’ and normal points correspond to ‘No Rain’ events. Atmospheric dataset with 17520 data instances for the year 2018 is used for this analysis. Base neural network with hyper parameters obtained from Table 2; 100 epochs and 16 batch size are utilised for this case study.
Rain data classification is done using the set of features given in Table 4.
Table 4: Basic set of features used for rain data classification
Sl. No.
|
Basic features
|
1
|
Temperature
|
2
|
Humidity
|
3
|
Wind speed
|
4
|
Pressure
|
5
|
Radiation
|
Table 5: Results: Weather Prediction
Data and Results: Weather Prediction
|
No of normal points (16133, 7)
No of outliers (1387, 7)
No of outliers correctly classified 220
True Positive 2299
False Positive 920
False Negative 61
Accuracy of the system: 71.97142857142858
Outlier Detection rate 78.29181494661922
False Alarm rate 28.580304442373405
|
In Table 5 outlier detection rate corresponds to rain data classification. Results shows that outlier detection rate is in accordance with the hypothesis developed in Section 4. This validates the use of the neural network with a set of hyper parameters obtained from the generated dataset. Once the general framework for outlier classification is created, the same neural network model can be extended further to perform domain specific analysis.
5.2 Extending the base neural network model to nowcasting model
5.2.1 Nowcasting Model Development
The case study explained in Section 5.1 can be considered as a weather forecasting system with zero lead time (current weather prediction). Current weather prediction uses current atmospheric features to predict the likelihood of event rain in the current time frame. However, nowcasting is an important term used in meteorology for forecasting the weather occurring in the next few hours (Brandyn et al., 2018). Base model obtained and utilised in section 5.1 can be extended to a nowcasting model. This section presents a case study on how to use the neural network model for weather nowcasting after adding new derived features into the base weather prediction model. The objective of this case study is to extend the base model developed using the generated data by adding atmospheric features obtained from domain experts.
This case study uses an architecture of the IoT system illustrated in Fig.7. Here, data collected from the radar (sensor in other applications) in the perception layer is transmitted to the cloud through a network layer. Based on the user’s requirement, selected data can be downloaded into a local database for processing and prediction purposes. Business layer visualises the results which help the users to interpret the results. This IoT architecture can be viewed as a general model for any application with domain specific changes in the dataset.
To develop a nowcasting system for this architecture literature has been studied. It shows that several data driven approaches have been developed in recent years to predict rainfall (Manandhar et al., 2019). However, domain experts explain that rainfall is dependent on a myriad of atmospheric parameters (Manandhar et al., 2019) and a single feature cannot increase forecasting accuracy. Liu et al., 2019 emphasises this fact that a good model cannot be constructed if the neural network model is dependent on a single predictor. Therefore, addition of domain specific features into the basic neural network model can enhance the rain fall forecasting accuracy. Since there is no strong correlation between rainfall and any of these meteorological parameters (Liu et al., 2019); new techniques or features must be identified to improve the forecast accuracy of short-term rainfall. Studies show that current rainfall information of different magnitudes has an impact on upcoming floods (Jia et al., 2020). In accordance with this study a new feature current rain is introduced for this analysis. Time of the day used by Liu et al., 2019 is modified into a categorical data due to its impact on the prediction accuracy. Similarly, seasonal factors are included in accordance with study conducted by Ceglar and Toreti, 2021 and the studies about Indian sub-continent Kothawale et al., 2010.
This forms the set of three derived features: current rain, four separate categories for time of day (12am-6am, 6am-12pm, 12pm-6pm, and 6pm-12am) and four different classes for season of the year (Winter: Jan-Feb, Pre-Monsoon- March-May, Summer Monsoon- June-Post Monsoon-Oct-Dec). Table 6 presents the domain specific features used for the current analysis.
Table 6: Selected features for nowcasting
Sl. No.
|
Basic features
|
Derived features
|
1
|
Temperature
|
Current rain
|
2
|
Humidity
|
Season of the Year (4 classes)
|
3
|
Wind speed
|
Time of the day (4 classes)
|
4
|
Pressure
|
|
5
|
Radiation
|
|
After selection of domain specific features, base neural network model can be used directly without any modifications.
5.2.2 Nowcasting: Results
In this analysis, the base neural network model developed in section 5.1 is modified to perform forecasting of rain events in different lead times. Because nowcasting is used for prediction for 0- 6hrs, analysis done in this case study examines the forecasting accuracies for 0-6hrs. Table 7 presents the nowcasting results after applying domain specific heuristics into the base model. The fundamental neural network model employed in this research is the same as in Section 5.1, and the dataset is the same: 17520 data instances for the year 2018.
Table 7: Forecasting accuracies with various lead times
Forecasting results
|
Prediction Results from Table 7
|
Features: Time, Season, Temperature, Humidity, Wind Speed, Wind direction, Pressure, Radiations, Current Rain
|
Temperature, Humidity, Wind Speed, Wind direction, Pressure, Radiations
|
Simple NN(T+N)
|
Rainfall forecasting rate
|
Rain fall forecasting False alarm rate
|
Current:
Outlier detection rate:78.2
False alarm rate: 28.5
|
Current
|
82.4
|
24.2
|
1hr
|
81.2
|
34.3
|
2hrs
|
80.4
|
29.4
|
3 hrs
|
80.6
|
29.5
|
4 hrs
|
76.4
|
32.2
|
5 hrs
|
78.2
|
31.9
|
6hrs
|
74.5
|
24.0
|
Results and comparisons presented in Table 7 shows that addition of derived features improved the forecasting accuracy of the base model. This case study illustrates the various steps to apply the base model for short term rainfall forecasting. Results show that within a lead time of 6 hrs existing neural network models have a higher forecasting rate with tolerable false alarm rates.
5.2.3 Data level improvements in nowcasting accuracies
For any data analysis problems improving forecasting accuracies is a necessary step that helps the users to enhance the basic model. As we need the same structure of the base neural network model with the same set of hyper parameters, data scientists have to ponder for new measures to increase the accuracy. Due to the fact that the dataset used in this analysis contains class imbalanced dataset, where the percentage of data instances in the class ‘rain’ is low as compared to the ‘no rain’ data instances; further optimisations can be done at data level. Data-level methods for addressing class imbalance problems include oversampling and under-sampling. (Johnson and Khoshgoftaar, 2019). Under-sampling discards data voluntarily, lowering the overall amount of data from which the model may learn. Due to the increasing size of the training set, over-sampling will result in an increase in training time (Chawla et al., 2004). This case analysis uses both over sampling and down sampling methods to improve the nowcasting accuracies. Table 8 and Table 9 and gives accuracy improvements after up sampling and down sampling for a lead time of 1 hr and 6 hrs.
Table 8: Comparison of nowcasting accuracies (Lead time: 1 hr)
Season
|
Percentage of rain data
|
Rain data Detection Rate
|
False Alarm Rate
|
Without sampling
|
8
|
81.2
|
34.3
|
Down sample
|
1:1
|
98.7
|
69.7
|
Up sample: Random
|
1:1
|
97.5
|
57.7
|
Table 9: Comparison of nowcasting accuracies (Lead time: 6 hrs)
Season
|
Percentage of rain data
|
Rain data Detection Rate
|
False Alarm Rate
|
Yearly
|
8
|
74.5
|
24.0
|
Down sample
|
1:1
|
97.2
|
76.8
|
Up sample
|
1:1
|
97.2
|
69.8
|
Oversampling and down sampling methods increase the detection rate penalising the false alarm rate, further analysis is required in this dataset. As a result, researchers are looking into the impact of a higher number of outlier points (event rain) on forecasting accuracy. For an atmospheric dataset, this can be accomplished by data analysis that divides the data into different seasons.
Table 10: Season Wise Analysis (Lead time: 1 hr)
Season
|
Percentage of rain data
|
Rain data Detection Rate
|
False Alarm Rate
|
Monsoon
|
16.9
|
91.0
|
43.7
|
Autumn
|
7
|
54.2
|
4.0
|
Summer
|
3.8
|
51.4
|
5.0
|
Winter
|
0.6
|
5.0
|
3.4
|
Table 11: Season Wise Analysis (Lead time: 6 hrs)
Season
|
Percentage of rain data
|
Rain data Detection Rate
|
False Alarm Rate
|
Monsoon
|
16.9
|
87.2
|
53.5
|
Autumn
|
7
|
54.7
|
23.2
|
Summer
|
3.8
|
22.2
|
6.7
|
Winter
|
0.6
|
0
|
0
|
Results obtained from season wise analysis shows that forecasting the event rain is easier in monsoon as compared to other seasons. This identification reveals the fact that identifying outliers is easier when the number of outlier instances are high. This discovery can be used by data analysts to improve forecasting accuracy in domain-specific datasets.