The summary of VGI processing showed that the count of tweets increases up until the mid-way of the event, which can be observed between 27th and 29th of August for both total and relevant tweets with high WD observations, where the maximum was 5 m, 3 m, and 4.8 m during the 27th, 28th, and 29th respectively. Also, there was a weak positive correlation between the average WD of VGI and the gauges on daily basis (r = 0.23) (Fig. 16). In addition, 29.4% of all the tweets were classified as relevant, and 76.5% of these relevant tweets occurred during 25th to 29th of August when the hurricane lasted, which indicates that majority of relevant tweets were more available during the time of the disaster relative to pre- and post-disaster.
5.1 Research Questions
Regarding the first and second research questions, there was a significant difference in terms of spatial and temporal distribution of VGI modalities (p < 0.001 for both comparisons). This could be associated with the limited availability of text observations (5%) compared with the multimedia (about 95%), which was demonstrated by Li et al. (2017). This finding suggests that disaster studies focusing on text might represent only a small fraction and hence potentially induced sampling bias in their investigation Thus, the limited sample size of text messages could reduce the quality of the assessment outputs and influences the decisions made upon such outputs. When conducting a damage assessment or rapid flood analysis, the user should consider that text modality requires additional text analysis (semantic and sentiment) to extract damage information or flood conditions.
The results of Wilcoxon post-hoc test indicated that people tend to share multimedia content regarding their conditions and situation rather than describing it through text messaging. Since picture and video modalities had no significant differences in terms of frequency (p = 0.557) and the robust availability of multimedia during disasters, it should be taken under consideration to include VGI multimedia in big data studies because of its large sampling availability and its better context, compared with text. Moreover, the large availability of multimedia could provide a broader insight for the user in terms of rapid assessment of flood damage or risk with more spatial and temporal coverage. On the other hand, extracting flood-related information from multimedia would require image interpretation and digital processing, which can be more time-consuming and labor-intensive and present practical barriers to acute needs of emergency response.
Regarding the kernel density, the results of Friedman test showed that all VGI data modalities have varying spatial extent. This could be related to the limited sample size and distribution of text observations, where it was closer to downtown Houston and limited towards to the fringe of Harris County. On the other hand, pictures and videos scattered across the study area with more coverage in space and time than text (Fig. 17). In addition, WD visualized in multimedia provides better spatial context because they could be captured at locations with varying distances from the actual flooded area. For example, a multimedia could be taken on the third floor of a building showing a flood at the parking lot in front of the building, while other multimedia could show WD up to the sidewalk across the street in a small neighborhood with a short distance from the actual location where the user was standing to take the picture or the video. Without tedious description of the spatial context, however, text information is often assumed to indicate WD at the location of geotagged tweet. Also, zooming could be a factor influencing the distance of measured WD point from the actual multimedia location. Although pictures and videos had a visually similar distribution, the post-hoc test showed a significant difference between both data modalities. It could be related with the difference in sample size of pictures (350) compared with videos (206) and the broader spatial distribution of pictures where it may influence the kernel density outputs, compared with the videos (Fig. 18).
Another reason to explain the spatial variation of VGI could be associated with the spatial pattern of the digital divide, where population subgroups may have varying access to digital information and communication technology (ICT) (Riggins and Dewan 2005). Based on the digital divide index (DDI) calculated by Gallardo (2018), the DDI score is composed of two main components including infrastructure/adoption and socioeconomic characteristics. The DDI score ranged between 0–100 where high DDI score indicates high existence of a digital divide and vice versa. Overall, there was a weak to low negative correlation coefficient between the DDI and the count of VGI points at each tract in the study area with − 0.09, -0.20, and − 0.16 for text, pictures, and videos respectively. Therefore, digital divide did not show significant influence on the spatial distribution of VGI. Additionally, most VGI observations occurred in areas with DDI less than 50 (moderate to low digital divide), and DDI between 45 and 50 exhibited the most frequency of VGI observations (Fig. 19). This finding supports the correlation results above where digital divide did not show significant impact on the VGI data.
The temporal distribution of the data also varied significantly according to the chi-square test (Table 3). Once again, the count of text observations might play a role in this variation. Furthermore, there is a possibility that users in some areas might have experienced technical issues with sharing tweets in real-time, such as power outage or temporarily limited internet access, that prevented them from sharing their situation during the event. According to power outage reports between August 27th and September 1st, about 71.2% of VGI observations were in areas with up to 5% of power outage, while 22.8% of the observations were in areas between 5% and 20% of power outage, which might suggest an inverse relationship between the count of tweets and the percentage of power outage in the study area (Fig. 20). A Friedman test was conducted, and the results revealed a significant difference between the count of VGI observations and the class of power outage with p < 0.001, indicating that power outage might be related to the VGI frequency across data modalities.
The overall findings of the VGI data modalities analysis suggests that the limited count of text observations influence its contribution in flood analysis when used as a single source of WD input. In a previous study, limited counts of text observations along with uneven spatial coverage could possibly result in underestimated flood extent (Li et al. 2017). However, it is recommended to integrate VGI with traditional flood mapping methods, despite the limitation of using it by its own (Rosser, Leibovici, and Jackson 2017).
To examine the quality of WD derived from VGI and RS (i.e. the second research question), the results showed a significant difference between the two WD datasets. It is important to note that the VGI surface used was interpolated, which could have “smoothed” the two datasets and narrowed any significant differences. For example, the interpolation surface has varying WD values ranging between 0.4–1.8 m, while VGI had a maximum WD of 3 m. Because the only post-disaster RS image available was on September 1st, VGI observations on that day were sampled and aggregated for interpolation, and locations where multiple tweets coincide (e.g. 3m along with varying WDs) were averaged for interpolation. One complication to interpret this finding, however, is the relatively coarse spatial resolution of the RS data being used (20 m after resampling). It is possible to take advantage of pan-sharpening methods to enhance the detection of water pixels more accurately at fine details (Du et al. 2016). However, cloud cover limits the availability of the usable RS image and the ability to detect water leading to possible underestimation of water body (Schnebele et al. 2014; Li et al. 2017).
The third research question examined the validation of WD from VGI against authoritative data sources of USGS stream gauges and FEMA depth grids. Using the stream gauges as references, the analysis showed that there was no significant difference in WD between VGI and the gauges. In light of water information available at fine temporal resolution for both datasets, this finding of indifferences between the data sources is indicative of the quality of VGI. In addition, the close proximity between the geographic locations of stream gauge and VGI points could also attribute to such agreement.
The second validation was against FEMA depth grids. The analysis showed a significant difference between the two datasets. Despite this finding, one uncertainty in this study was the manual extraction method from VGI, where WD may be subjected to the geographic reference (e.g. height of curbside, hydrant) being used (Table 4). On the other hand, the modeled depth grids from FEMA simulated at a given time would have higher internal consistency. In addition, the spatial variation between FEMA depth grid and VGI might also influenced the comparison between both datasets. The depth grids from flood simulation were mainly modeled and calibrated at stream gauges, whereas majority of the VGI observations were closer to the urban landscapes and more are found within residential areas (Fig. 15). As mentioned, the low spatial accuracy of some VGI points could jeopardize the examination of any significant differences between the two datasets. Nevertheless, the findings of no significant difference between VGI and USGS stream gauge data but its use in modeled WD grid suggest a myriad effect of different forces in the propagation of uncertainties in flood modeling.
Most of the matching tweets did not include WD information –– only 4.5% of all Harvey-related tweets. It could be associated with the selected hashtags and keywords that might filter out tweets that could be counted as relevant, or with the way users might share their experience with the hurricane by using words might not be explicitly related to the event. One of the reasons explaining the large count of relevant tweets is the participation and engagement many Harvey-related hashtags were promoted by individuals and different governmental agencies as a tool for information dissemination regarding the event (Smith et al. 2015). Nevertheless, these promotion efforts did not target the solicitation of WD information from the public. With reference to established protocols deploying specific hashtags collect structured VGI useful for disaster response (Starbird and Stamberger 2010), it is possible to improve the quantity and quality of relevant tweets.
Other reasons for the limited count of tweets with relevant WD information could be attributed to the overwhelming use of a popular hashtag (e.g. #HurricaneHarvey) on irrelevant content, missing or broken link, or the overuse of a redundant post (e.g. retweet) by multiple users (Fig. 21). These factors influence the time and effort required to collect, preprocess, classify, and extract relevant information regarding the disaster event from VGI sources. A fifth possible source of limited water observations count is overlapping of the geotagged tweets. For example, 44 tweets with water observations from different users shared the same latitude and longitude location on different days. This could be related to the attachment of the same location by the users when posting the tweet. Multiple users might prefer to tag the generic name of a place (e.g. Houston) in a tweet, which would assign the same coordinates for all the tweets with such tags (Steiger et al. 2015).
In overall, the study limitations were associated with the preprocessing and extraction of WD from the VGI and RS data availability and analysis. The manual extraction from relevant VGI is time-consuming and might be difficult when no obvious physical marks are available, or when they are distant from the position of the user when he/she took the picture or the video. Moreover, the retweet of an image by multiple users makes it difficult to find the duplicate posts with a large amount of data to be observed and may lead to overestimating the count of relevant tweets. In addition, part of the relevant tweets, according to the hashtags and keywords, were sharing non-relevant information (e.g. family pictures or food pictures), and a group of tweets shared the same geographic location, which influences the analysis. Besides, some multimedia had poor quality due to insufficient lighting to observe the level of water.
The availability of RS data with respect to reasonable cloud coverage during or right after the event was limited. In a storm or hurricane, the weather condition suitable for remote sensing is often limited. In addition, the spatial resolution (20 m) influenced the results of delineated water bodies. The processing, extraction, and interpolation of VGI to WD was time consuming and computationally intensive for a county or regional project.