This study was conducted to understand the difference in toxicity concerns between 356 non-clean and clean cosmetic products by using EWG’s Skin Deep framework. Compared to non-clean products, clean products are cleaner (0.71 lower EWG score). Fragrance is associated with a higher EWG score than the other products (2.42; 95%CI: 1.57 – 3.27), likely attributable to the presence of ingredients related to worse health effects and toxicity concerns. Among both clean and non-clean products, the increasing level of concern of cancer, allergy, and immunotoxicity are significantly correlated to an increase in EWG score, respectively.
This study is the first pilot study to compare Sephora's clean seal system, a popular marker for clean beauty products, and EWG’s Skin Deep review system. We provided additional insight into the different levels of toxicity concerns for different types of beauty and lifestyle products, as well as possible areas of discordance between clean and non-clean products. For example, the EWG score takes into account more “unacceptable” ingredients using their gathered scientific research than Sephora’s clean standard list. This discrepancy is important as consumers are more likely to purchase clean cosmetics when products contain eco-friendly labels without any knowledge of toxicity concerns (15). This study also provides an example of a methodology for consumers to utilize and adopt a framework such as EWG’s Skin Deep’s ranking system to better understand the specific health effects of products and their ingredients.
Although products with a clean seal are cleaner than non-clean products based on their EWG scores, 3.23 and 4.11 respectively, clean seals, defined as products without containing parabens, phthalates, formaldehydes, and more, may not be a good marker for consumers to purchase healthy cosmetic products, and potentially misleading. There are several areas of discordance that are evident among clean and non-clean products based on their EWG scorings and levels of toxicity concerns. The distribution of EWG scores of clean and non-clean products are largely overlapped (Figure 1). Also, labeling with a clean seal does not guarantee less toxicity or concerns of cancer, development, immunotoxicity, and use restriction. Our results showed that the distributions of the four types of toxicities are similar among both clean and non-clean products (Table 2). The majority of both clean and non-clean products are associated with lower cancer concern, while the lowest number of products are associated with high concern (Table 2). However, when examining the distribution of EWG scores for clean and non-clean products based on the four main categories of toxicity concern, only non-clean products are associated with a positive trend between EWG score and concern for cancer as well as allergy and immunotoxicity concern (Figure 3). The distribution of EWG scores for developmental and reproductivity concerns and use-restriction concerns indicate no significant association between EWG score and toxicity concern level. Also, both products have a majority of their respective products having higher allergy and immunotoxicity concerns compared to lower concern levels, but clean products consist of a higher percentage (61.67%) compared to non-clean products (51.14%) (Table 2). Both majorities of clean and non-clean products also fall under high or moderate use-restriction concerns.
There are several possible reasons that fragrance has a higher EWG score. Firstly, ingredients commonly found in fragrance products (and not as commonly found in other product categories) may contribute to a higher score. These ingredients generally have a higher concern for the four main toxicity categories, as well as higher concern for other categories such as endocrine disruption, irritation, and non-reproductive organ system toxicity (14). For example, benzyl alcohol acts as an aromatic agent, preservative, and solvent, and is associated with allergies and occupational hazards as well as non-reproductive organ toxicity (16). Secondly, fragrance compounds can be present in many different forms and are represented with different names including fragrance, alcohols, esters, ketones, aldehydes, and alkalis are usually classified differently (17). Fragrance products with labeled terms such as “floral”, “exotic”, and “musky” may not accurately disclose their exact formulations and omit mixtures of natural and synthetic chemicals linked to reproductive problems (18). Thirdly, fragrance products have 14 different unlisted chemicals and 80% of fragrance products are not tested, on average (18). Results from product testing show that unlisted ingredients may include galaxolide and tonalide, which have shown potential for endocrine disruption in vitro studies and environmental toxicity in fish and crustacean growth functions (19). Fourthly, according to analyses conducted by EWG, the fragrance industry has published safety assessments for only 34% of the unlabeled ingredients (19). Although the available data is limited for fragrances (as summarized in Appendix B), EWG’s product testing and assessments have resulted in fair data availability for ingredients such as fragrance and benzyl alcohol. These ingredients may have a larger influence on final product scores when compared to other ingredients and ultimately result in higher scores among fragrance products.
This study has a few limitations. Although we included only products from Sephora which have been simultaneously reviewed by EWG Skin Deep, the fraction of included products (n=356) is relatively low, compared to 8,043 cosmetic products on Sephora’s platform, and 71,774 from EWG. Selection bias is less likely for the following reasons: 1) The numbers of products listed by both systems might be overcounted. Many products containing exactly the same main ingredients but minor modifications on the proportions of ingredients, flavors, aroma, and/or colors were considered as multiple products. 2) EWG reviews products independently, without cherry-picking specific products and regardless of the clean seals. Therefore, the reviewed products fairly represent products on Sephora.
Products included what was available in Sephora’s “clean beauty” section that were examined and scored by EWG’s team of scientists. An increase of data simultaneously available on both websites would be beneficial in creating a more representative sample, especially regarding fragrance and hair products. This may be achieved through a higher rate of frequency in product updates or more active participation from both ends to exchange data. Also, some of the products listed on EWG’s Skin Deep database included old formulations that were not updated, which may not be accurately reflected in some of the EWG scores. Some products were also limited in data availability, suggesting that much of the literature for various ingredients is lacking or needs further investigation by EWG. Thus, EWG Skin Deep’s weight-by-evidence approach to its scoring system may not factor unavailable or limited information on the hazardous effects of certain ingredients.
Products under the binary “clean beauty” labeling system at Sephora may not necessarily capture the nuances of EWG’s ten-point scoring system. It may be insufficient for consumers to solely rely on the presence of the clean seal when purchasing beauty products. Our study, by utilizing a reputable framework to evaluate products, fills various knowledge and regulatory gaps that exist for cosmetic products. Consulting supplementary frameworks and primary data sources may reinforce and expand on existing data and criteria provided by EWG as well as address remaining gaps. Additionally, it would be beneficial to examine the economic and regulatory implications of ‘clean’ beauty in other countries (i.e. cosmetics of prohibited use, other prioritized measures of toxicity concern, and the role of regulatory bodies in the cosmetic industry).