Overall, we observe that the filtering rules are significantly relaxed over the time for both websites and mobile apps. This provides substantial evidence that Saudi Arabia is cautiously opening its digital borders. We summarize the results of our study in Fig. 11, Table 2, and Table 3. We highlight the observations as follows:
Observation 1: Filtering has relaxed in each website category. We analyzed the accessibility for various websites categories as we discussed earlier. We present some observations per category. The overall conclusion is that for each category the filtering has decreased.
a. Adult websites. Unsurprisingly, we see that the most blocked category is Adult where 85.4%, 82.2%, and 82% of the websites are blocked in 2018, 2019, and 2020, respectively. The content of the sites in this list is usually related to pornography, gambling, drugs, violence, and similar content inappropriate for young audience. We spot-checked the content of some of the Adult sites that were not blocked and found that most are related to art work including comics and caricatures (we did not find any that are pornographic, gambling, or drug-related for example). We note that the increase is minimal on the other categories that are least blocked; however, we observe the opposite on the sites from the most blocked categories (e.g., Adult and Shopping). We see that the largest additive difference in blocking is in the most blocked categories. The observed drop was due to websites that used to be blocked and later turned to be accessible. In addition, if we consider the top 200 sites sampled from the three lists for the Adult category corresponding to the three measurements (since they are almost identical), we find that 98.5% and 94% were blocked in 2018 and 2020, respectively.
b. Shopping websites. The second most blocked category is Shopping. We believe the main reason behind blocking is that these sites sell products that are considered illegal (e.g., alcohol and guns).
c. Gaming websites. The third most blocked category is Games, in which all blocked sites related to gambling. Saudi Arabia considers gambling illegal since it is strictly prohibited under Islamic Shari’a law. There are no state-licensed casinos, bookmakers or poker rooms. In fact, all forms of gambling are illegal in the Saudi Arabia.
d. Globally popular websites. The Global category represents the most popular websites worldwide. In this category, we see that more than 7% of the these websites are blocked across the three-year measurements. We found that nearly 60% of these blocked sites also belong to the Adult category. The remaining 40% of the blocked sites belonged to social network applications such as WeChat and VK, as well as a few websites from China. Interestingly, in January 2019, students and faculty at the University of California (UC) have been warned not to use WeChat while visiting China; apparently, this warning is issued to protect their communications since that application raises security and privacy concerns [52].
Observation 2: Resolving a filtering mystery: most visited and yet blocked. Surprisingly, we see that 24, 23, and 19 of the most visited websites from Saudi Arabia per Alexa are blocked in 2018, 2019, and 2020, respectively. One might wonder how a popular site in the country is visited, while it is blocked. Alexa determines the popularity of a site based on two metrics: (1) Unique Visitors, which is the number of unique visitors of a web page; and (2) Pageviews, which corresponds to the number of URL requests (i.e., HTTP GET request) for a website [53]. Our multi-layer analysis of filtering was able to resolve the mystery. Our results show that all blocked sites from this category passed the DNS filtering test. This suggests that users received the requested DNS resolution, but were never able to view the web page, since it is blocked by other mechanisms as we discuss later in the section. However, its interesting that these sites are popular even as users fail to access them.
Observation 3: Finding a filtering loophole. We found a filtering weakness: content from a blocked website could be accessed indirectly through another website. This is best illustrated with this example: aljazeera.com is a blocked news site, while Twitter.com is not. A user could access contents on Aljazeera using its Twitter account as a "loophole": a link on Twitter points to an Aljazeera article. Since Twitter uses HTTPS, and hence, HTTP-URL-keyword filtering is not applied, we found that Aljazeera contents can be accessed and viewed from inside the country. This is surprising since we found evidence that filtering is performed at the TLS level (see Section 5.4); however, it is not applied in this scenario. Technically, the request through twitter.com/AlJazeera is missed by the filtering.
Observation 4: Server-side filtering. We also observe server-side filtering [54], where some websites reject requests from devices within Saudi Arabia. In other words, the website refuses to provide content to users in Saudi Arabia. An example of such a website is www.sce.com which belongs to Southern California Edison (SCE), a US company that provides energy and electricity to the Southern California region. While the site is accessible in US, HTTP code 502 is returned when trying to access it in Saudi Arabia indicating that the server blocked the connection. Our rational is that an energy company may want to protect its infrastructure. The US power-infrastructure has already been the targets of cyber attacks, with the most recent being the Colonial pipeline.
Observation 5: Filtering seems to be consistent across ISPs. With respect to the operation of the filtering mechanism, we found that Internet filtering rules applied uniformly across the different vantage points and ISPs we tested: thus, we suspect that there is no additional ISP-level filtering. After verifying this observation, we show results only from one of the vantage points in the remainder of the paper (N6).
Observation 6: The kingdom is moving towards more moderate regulations on digital filtering. With regards to mobile apps measurements, our results show that all apps were blocked in 2017, but access was permitted to 67% in 2018 and 93% in 2019. We repeated the experiment in 2020 and found that all apps, except WeChat, were accessible; see Section 5.5.
Observation 7: Geopolitical events reflects on filtering. We found evidence of the impact of real-world events on filtering. See Section 5.6 for more details.
In the remaining of the section, we provide a more in-depth technical analysis of filtering and the mechanisms.
5.1 DNS Filtering
We conducted experiments to measure filtering at the DNS level. As shown in Table 2, we did not encounter a significant presence of DNS filtering, meaning that the majority of DNS lookups are successful. The numbers shown in the DNS-UDP and DNS-TCP columns correspond to transient connectivity issues such as TIMEOUT and SERVFAIL. A very small portion of DNS lookups, consistently returned REFUSED, NoAnswer, or NXDOMAIN DNS error codes, but the number is negligible. To ensure that these websites are actually not blocked, we checked their status using our machines in the US and verified that these websites return the same errors indicating that they have likely gone offline.
We wanted to further investigate if the transport protocol (UDP or TCP) of request has any effect on the outcome of filtering. Our results suggest that it does not. We found that there was no difference between DNS lookups performed using UDP and TCP. This indicates there are no constraints in DNS over TCP deployment in the country, i.e., there is no filtering of DNS over TCP requests. We also repeated the same DNS lookups on 6 open DNS servers. We found that the final results confirm our findings. There were minor variations between the behavior of the open DNS resolvers. For example, Comodo had the lowest success rate in DNS lookups (especially in UDP) in all categories.
5.2 IP Address Filtering
We conducted experiments to see if there is evidence of IP address filtering. Our measurements show a number of websites in which the IP address filtering failed as shown in Table 2. The data suggests that the main cause of this issue is not the Internet filtering system, but DNS lookup failures since no IP address is retrieved. For all other websites, the tool was able to connect to their IPs on port 80. This indicates there is no filtering at the level of TCP (or UDP) connection establishment.
5.3 HTTP Filtering
We find that the majority of the filtering happens at the HTTP filtering level. In fact, the filtering here is roughly two orders of magnitude compared to filtering at earlier stages of the connection. In Table 2, we see that column "HTTP" dominates. A large number of websites were filtered based on the HTTP URL string, (either FQDNs or special keywords): 82.2%, 7.6%, and 6.2% of the Adult, Shopping, and Games websites were blocked in 2019, respectively. These connections were blocked at the HTTP level. In fact, we examined the logs, and we verified that the TCP 3-way handshake process between our Saudi machines and the forbidden site’s web server establishes successfully.
To fully explore this filtering stage, we want to identify if the filtering happens by examining the name of the domain (i.e., FQDN-based filtering), or via the use of keywords (i.e., URL-keyword filtering), as we discussed earlier.
a. FQDN-based filtering. We send the GET request directly to the website server of the FQDN we want to test. The filtering system allows the client to send the GET request (i.e., the request is forwarded to the forbidden server); however, instead of receiving a legitimate HTTP response, the system replies back with an HTTP response with the status code 403 for accessing forbidden content. This observation contradicts the findings by Verkamp et al. [17] where the authors report that a spoofed HTTP response with a status code 200 is returned, perhaps indicating that the filtering implementation has changed. The filtering mechanism then directs the user to one of the warning pages shown in Fig. 3 and Fig. 4, based on the site’s category. The warning page is an HTML < iframe > presenting a warning message both in Arabic and English. Figure 12 shows a Wireshark trace for blocked website www.betonline.ag (a gambling website). What happens after the prohibited request is received? By analyzing the traces, we found that after receiving the 403 HTTP error code, the client receives TCP-RST packets forcing a termination of the HTTP connection. The error message displayed when a blocked website is accessed explicitly indicates that this filtering is maintained by a company called Wire Filter [55] (see the HTML < iframe > in Fig. 13). Our investigation shows that WireFilter provides web security solutions and services (such as filtering) in Saudi Arabia [56].
b. URL-keyword filtering. As we explained earlier, this filtering mechanism searches the URL string against a forbidden list of strings. We find that overall URL-keyword matching plays a significant role in the filtering process.
In more detail, our results show identical behavior of filtering as in the case of FQDN-based HTTP filtering, confirming that the filtering uses URL-keyword filtering. The difference in these two cases occurs when using HTTPS. When we repeat the keyword-filtering experiments using HTTPS, the filtering does not work and the client receives the 404 Page Not Found code indicating that URL-keyword filtering works only at the HTTP-level connection. In this case, the connection (including the TLS handshake) is being established to the unblocked website (a.com), which returns that the specific page (the one we created with the blocked domain) is not found – no filtering occurred. A breakdown of HTTP filtering results is shown in Fig. 14.
5.4 TLS Filtering
We wanted to test whether how filtering handles requests using the HTTPS protocol. Recall that HTTPS encrypts the payload, which many filtering agencies can consider undesirable. We found that if the website being contacted is blocked, it remains blocked under HTTPS. When examining the traces, we discovered that this is due to TLS level filtering. As shown in Fig. 15, when trying to access the blocked website www.betonline.ag, the filtering system allows the TCP 3-way handshake but sends a TCP-RST packet when the client tries to establish a TLS connection. On Windows, when we sent the GET request, the Windows socket error code 10054 was returned, indicating that the HTTPS connection was forcibly closed by the server. In this case, the browser displays a page (as shown in Fig. 16) indicating that the HTTPS/TLS connection could not be established.
5.5 Mobile Application Filtering
In this section, we report our results in regards to mobile application filtering, and we start by providing some context regarding the policies of Saudi Arabia.
Historical context regarding mobile app usage. In 2013, CITC blocked the Voice over Internet Protocol (VoIP) call services on Viber, a popular mobile application that offers free video/voice calls [57]. VoIP calls on similar applications were slowly being blocked including FaceTime, Skype, Line, Tango, Facebook Messenger, WhatsApp and Snapchat. We believe that the main reason behind this ban was economic, since these applications provide free alternatives to services that otherwise generate revenue to cellular carriers and ISPs. CITC received requests from service providers such as Mobily and STC to block the free or low-cost VoIP calls on these applications to protect their competitiveness and rights [58, 59]. However, in 2017, CITC responded to citizens demands and announced its intent to lift the ban on all applications that provide voice and video communications over the Internet, as long as they meet the regulatory requirements of the country [60]. We conjecture that this decision was also driven by the Vision 2030 and National Transformation 2020 programs published with the aim of modernizing society. One of the stated goals is to provide transparency and clarity with respect to policies, especially in the telecommunications and information technology sectors.
We conduct our measurements for mobile apps over three years and show the results in Table 3. We consider 16 communication mobile apps including FaceTime, Tango, Line, Viber, SOMA, YeeCall, Facebook Messenger, WhatsApp, Snapchat, and imo. We attempt to install and use them on two iPhones, one in Saudi Arabia and the other in USA. We tested the text, audio, and video communication services. All of these apps support text, audio and video communication.
In March 2018, five applications failed to establish at least one of the text, audio, and video communication services, as shown in Table 3. WhatsApp and Viber established an active connection for 1–2 seconds, but then the calls got disconnected suddenly. We believe that this experiment indicates that these two applications were indeed blocked in Saudi Arabia [61]. We also found that VoIP calls on imo are blocked in Saudi Arabia.
In October 2019, we repeated the experiment and added two more applications: Houseparty and WeChat. We found that both audio and video calls are blocked on WhatsApp, which was corroborated by a CITC statement [62].
The application WeChat exhibits a uniquely interesting behavior. WeChat is one of the most popular messaging applications owned by the Chinese company Tencent [52, 63, 64]. In Saudi Arabia, the installation of the application comes with a pre-condition: the user has to show that s/he has a friend on WeChat, who needs to meet additional requirements as shown in Fig. 17 above! These requirements for the friend are that s/he: (a) has been a WeChat user for at least one month if s/he an international user or for 6 months if s/he is a China Mainland user; (b) has not completed "Help Friend Register" check for other new user in the past month; (c) has not been blocked from using WeChat in the past month; and (d) if s/he is a China Mainland user, s/he has activated WeChat Pay. By contrast, WeChat can be installed in the US without any such requirements. Intrigued, we repeated the experiment with three more iPhones in Saudi Arabia. The installation failed on all of them for the same reason. Although the vice president of Tencent announced back in 2013 that WeChat is available in Saudi Arabia [65], this is not fully accurate. Currently, we are not sure if the installation failure is caused by Tencent or the Internet filtering system.
All other tested messaging applications are open and supported including Viber and imo.
Finally, in April 2020, we repeated the experiment and found that nearly all previously-blocked messaging applications were accessible, including WhatsApp, with the only exception being WeChat.
5.6 The Effect of Geopolitical Events on Internet Filtering
Over the past years, the Middle East experienced several major political events that caused world-wide ramifications. We quantify the effect of these events on Saudi Arabian policies regarding access to information, as we discuss below.
A prominent event was the rise of the so-called “Islamic State” in Iraq and Syria (known as ISIS) in the last decade with global impact. We tested the accessibility of a number of ISIS-friendly websites and found that all of them were blocked. Note that we obtained these sites by searching prominent ISIS-friendly websites on the web and following links from authoritative sources: following previous practices, we do not publicly disclose these sites due to ethical considerations. ISIS and its affiliates have exploited social media websites, such as Twitter, to spread their propaganda and to recruit new members [66]. This type of activity has in turn been countered by efforts from the Saudi Arabian government by regulating information access for Saudi citizens. For instance, at the Shura Council, the chairman of the Islamic and judicial affairs committee called for the blocking of all ISIS websites as they were considered sources of terrorism and destabilization for the region [42]. Restricting access to these sites was also implemented by many other countries and institutions [67–71]. In addition, many Saudi citizens launched an online campaign on Twitter aiming to lock down user accounts belonging to or supporting ISIS [72].
Another prominent event is the increased political tension between Qatar and Saudi Arabia, which was also captured in our measurements. Because of these tensions, the Saudi authorities blocked some Qatari news web sites [73] in 2017. For instance, as shown in Fig. 4, a warning page by the Ministry of Culture and Information was displayed when we tried to visit www.aljazeera.com; one of the most popular news websites in Qatar. In addition, in April 2020, we obtained a list of Qatari news sites [73] and found that all of them were blocked.
Another notable geopolitical event in our study period is the ongoing conflict between Saudi Arabia and Iran. Following an attack on the Saudi embassy in Tehran in January 2016 [74], Saudi Arabia cut all diplomatic relations with Iran. Our measurements show evidence of this event through its impact on the Internet filtering. In particular, our measurements in 2018 show that many of the top Iranian sites (mostly from the News category) got blocked.
Finally, in April 2020, we observed a change in the access for some Turkish sites compared to the earlier measurements. Upon investigation, we found that Saudi authorities blocked two prominent Turkish news websites, Anadolu and TRT Arabic platforms, amid what the Ministry of Media communicated as continued violations of their regulations. We conjecture that the move was partly driven by a campaign on Twitter by Saudi citizens calling for the Turkish news platforms to be blocked [75].