We performed a field experiment, a quantitative research method, to investigate the potential causal relationship between the public disclosure of domain registration data and the incidence of unsolicited email advertising [25]. To this end, we registered 66 randomly-generated domain names under three gTLDs with eleven domain name registrars. Half of these domains had public registration data, and the other half had redacted data by either the registry or registrar in compliance with the GDPR. The redaction involved concealing specific elements of PII, such as the registrant's name, address, and email address. From July 2022 to June 2023, we monitored the associated email addresses for unsolicited emails, defined in this study as any commercial, promotional, or potentially malicious emails that the registrant did not explicitly request.
3.1 Hypotheses
Based on the objectives, research question, and findings from our conceptual foundations, we propose the following hypotheses:
H1: Publishing a domain name's registration data leads to email spam.
H2: The quantity of incoming spam emails varies significantly depending on the gTLD under which the domain is registered (related to H1).
H3: The quantity of incoming spam emails varies significantly depending on the registrar with which the domain is registered (related to H1).
We have identified independent and dependent variables for the field experiment to examine potential causal relationships. In all hypotheses, the dependent variable is the number of spam emails received at the designated email address, which we seek to explain. Conversely, the independent variable is the factor that we hypothesize influences the dependent variable. More precisely, the independent variable for H1 is the publication of the registration data; for H2, it is the gTLD; and for H3, it is the registrar.
3.2 Control and Experimental Group
The field experiment was performed on a control and an experimental group [25]. To achieve high external validity, the domains in the control and experimental groups were each registered under eleven registrars and three gTLDs [25]. Based on the information in Section 2 that 85% of registries and registrars do not publish the registration data, the totality of domains whose registration data is not published was assigned to the control group. The totality of domains with published registration data was counted as part of the experimental group.
3.3 Selection of Registrars
We selected eleven registrars for the field experiment, representing approximately 38.3% of the total market share, as they collectively hold about 134.1 million out of 350.5 million registered domains worldwide [26]. The selection was based on the rankings of the registrars with the most registered domains globally [27].
Several registrars are part of group companies, and to achieve as differentiated a result as possible, we only considered multiple ICANN-accredited companies once. We did not aggregate the number of domains to maintain the existing ranking:
- PDR (#7) is part of Network Solutions (#5)
- eNom (#11) is part of Tucows (#3)
- Alibaba Cloud (#16) is part of Alibaba (#13)
- Wild West Domains (#17) is part of GoDaddy (#1)
We excluded registrars that did not offer the option to consent to the publication of personal data, which was a necessary condition for the field experiment. These registrars included:
- Network Solutions (#5)
- IONOS (#6)
- Alibaba (#13)
- Wix (#19)
- Hosting Concepts (#24)
Several registrars were omitted as they exclusively offered credit card payment, a method that was declined by the author's credit cards:
- Google (#4)
- Reg.Ru (#15)
- Chengdu West (#25)
NameBright (#12) was omitted as it only provides three gTLDs (.com, .net, and .org), limiting the necessary variation of gTLDs for the field experiment. FastDomain (#21) was also omitted, as it offered domain registrations exclusively with additional services. Please refer to Table 2 for a detailed representation of the selected and excluded registrars.
Table 2: Registrars, ordered by the number of registered domains, with remarks on the selection for this field experiment.
Registrar Number
|
Registrar
|
Number of Domains
|
Remarks
|
1
|
GoDaddy
|
77.1 million
|
selected
|
2
|
Namecheap
|
16.6 million
|
selected
|
3
|
Tucows
|
11.0 million
|
selected
|
4
|
Google
|
8.3 million
|
credit card declined
|
5
|
Network Solutions
|
6.3 million
|
no publication possible
|
6
|
IONOS
|
5.8 million
|
no publication possible
|
7
|
PDR
|
5.5 million
|
belongs to #5
|
8
|
GMO
|
5.3 million
|
selected
|
9
|
Namesilo
|
5.2 million
|
selected
|
10
|
OVH
|
5.0 million
|
selected
|
11
|
eNom
|
4.8 million
|
belongs to #3
|
12
|
NameBright
|
4.2 million
|
only three gTLDs
|
13
|
Alibaba
|
4.2 million
|
no publication possible
|
14
|
Dynadot
|
3.9 million
|
selected
|
15
|
Reg.ru
|
3.5 million
|
credit card declined
|
16
|
Alibaba Cloud
|
3.4 million
|
belongs to #13
|
17
|
Wild West Domains
|
3.2 million
|
belongs to #1
|
18
|
Key-Systems
|
3.0 million
|
selected
|
19
|
Wix
|
2.6 million
|
no publication possible
|
20
|
Name.com
|
2.5 million
|
selected
|
21
|
FastDomain
|
2.5 million
|
different business model
|
22
|
Sav
|
2.3 million
|
selected
|
23
|
Gandi
|
2.2 million
|
selected
|
24
|
Hosting Concepts
|
2.0 million
|
no publication possible
|
25
|
Chengdu West
|
1.9 million
|
credit card declined
|
3.4 Selection of gTLDs
We registered the domains under three different gTLDs, selected based on three criteria: the number of registered domains, the registry, and the location of the technical operator (as illustrated in Table 3).
Firstly, we chose the most popular gTLD, .com, which, with approximately 161.3 million registered domains, holds a significant global market share of around 46% out of 350.5 million domains [26, 28].
To maintain diversity in our selections and to avoid over-reliance on a single registry operator, we also included gTLDs run by different registries and operators based in different countries. Although .net is the second most popular gTLD globally, it is also operated by Verisign and was thus excluded to ensure variety [29]. The third most popular gTLD, .org, is operated by a different registry but is also US-based [26, 30]. Consequently, we selected .xyz, the fourth most popular gTLD, which a US company operates, but its technical operations are managed by the British company CentralNic [26, 31].
We avoided gTLDs operated by US-based registries or those managed by the same technical operator for the third choice. The seventh most common gTLD, .top, was not included as it was unavailable with all selected registrars (as illustrated in Table 3). Consequently, we opted for .shop, the tenth most common gTLD. It is administered by a registry based in Japan [26, 32, 33].
Table 3: gTLDs ordered by the number of registered domain names.
gTLD
|
Number of domains
|
Registry
|
Location of the technical operator
|
.com
|
161.3 million
|
Verisign
|
United States of America
|
.xyz
|
4.0 million
|
CentralNic
|
United Kingdom
|
.shop
|
1.0 million
|
GMO
|
Japan
|
3.5 Experimental Design
Our experiment design involved registering six domain names under each of the three selected gTLDs (refer to Table 3), with each of the eleven selected registrars (as illustrated in Table 2). Each domain was registered for a maximum period of one year. We used new and unused email addresses for each set to create accounts with the registrars and to register the domains. We also established two subdomains for these email addresses: 'service' and 'support'.
To generate realistic email addresses, we used randomly generated first and last names in the format: [email protected]. Half of the domain set was registered without consent to publish personal data, while the other half was registered with such consent (as illustrated in Table 4).
Importantly, following registration, we did not set up the domains further. In other words, we did not establish a website, submit the domains to search engines, or provide the domains to third parties. This decision ensured that any unsolicited emails received were solely attributable to the publication of domain registration data rather than other potential sources.
3.6 Monitoring and Data Collection
Throughout the one-year monitoring period (end of June 2022 until beginning of June 2023), we operated a mail server using Postfix open-source software [34]. We established a mailbox for each of the 66 email addresses using the open-source software Dovecot [35].
To evaluate the experiment, we assessed the number and type of emails received at the end of the monitoring period. We retrieved the emails using the open-source software Thunderbird, which was configured to prevent the loading of external content. This setting ensured that senders would not receive feedback that their emails were read simply by opening the incoming emails [36].
This configuration allowed us to accurately monitor the volume and content of unsolicited emails received, directly attributing any such correspondence to the publication status of domain registration data. The measures also ensured that our interactions did not unintentionally stimulate or dissuade further unsolicited emails, maintaining the integrity of the dataset.
3.7 Controlling for Confounding Factors
In a field experiment, the effect is the dependent variable, and the cause is the independent variable [25]. All factors that could influence these variables are known as confounders and can affect the experiment's internal and external validity. For this study, we implemented six measures to control these potential confounders [25]:
- Registrar variation: We registered the domains with eleven different registrars. This approach was designed to systematically control the influence of the registrar, mitigating any potential bias from using a single provider.
- Domain and gTLD variation: We registered three domains under three distinct gTLDs, each time with and without publication of the registration data. This measure was implemented to control the influence of the domain name and gTLD through systematic variation.
- Isolation from third-party: The registered domains were not set up in any other manner. We did not create personal websites, submit the domains to search engines, or share them with third-party providers. By taking this step, we ensured the domains did not attract potential senders via other channels, eliminating this confounding factor.
- Using an existing domain: We created the 66 email addresses under an existing domain (tobiassattler.com) that was not part of the experiment. By doing so, we precluded the possibility that registering a new domain could attract attention from lists of new registrations, thereby eliminating this potential confounder.
- Creation of unused subdomains: We used two new and unused subdomains, 'service' and 'support,' under the existing domain (tobiassattler.com). This measure was designed to prevent potential unsolicited email senders from guessing the email addresses (e.g., [email protected]) and thereby eliminate this disruptive factor.
- No spam filtering: We did not use spam filterings, such as allow-, grey-, or blocklisting, for these email addresses. This measure was taken to avoid any distortion of the results by filtering or blocking potential spam emails and eliminating spam filtering as a confounding factor [37, 38].
By implementing these measures, we aimed to increase the validity of our experiment and ensure the data collected accurately represented the cause-effect relationship between domain registration data publication and the incidence of unsolicited email advertising.