We have shown that faces of men with a higher concentration of circulating testosterone were rated as significantly less attractive by young women, on the 13th fertile day of their menstrual cycle. This result challenges the hypothesis that women in the fertile phase would prefer masculine, high testosterone men . If such men would have been preferred by hundreds of generations of ancestral women, as the hypothesis suggests, androgen associated traits such as the massive jaw, cheekbones, and prominent brow ridges, increase musculature, and dominant aggressive behavior would have been amplified by forces of natural intra-sexual selection [54, 55]. In contrast, several archeological and anthropometric studies provided compelling evidence that the Homo genus, which includes Humans, Neanderthals, and other ancestors, has experienced an evolution toward smaller faces over time, with Homo sapiens showing the greatest reduction in size [56, 57]. Fast shrinking face, the rapid evolution of mandibular shape and size, and rampant brain growth are suggested from analysis of numerous human fossils [57–60]. Hence the forces of evolution worked against face masculinization and rather favored the development of cognitive abilities.
We showed that the composite attractive male face is slightly smaller than the less attractive one. Humans can detect facial attractiveness very quickly and reliably based on subtle geometric differences in facial features along specific axes in face space [61–62]. Smaller faces and lower TT concentration in the attractive group are mutually consistent findings, which could be related to diminished androgenic effects in earlier periods of life .
A broader framework of the evolution of advanced intelligence in humans connects drastic reduction of testosterone-related aggression with diminished sexual dimorphism, which may have begun as early as 3 million years ago . Aggressive behavior can be even predicted based on testosterone-related structural brain phenotype . Lower testosterone most likely favored increased cooperation between individuals for better survival and facilitated departure from a chimpanzee-like mating system with high levels of male-male physical competitive violence, requiring a large muscle body mass, and sexual coercion of females [65–67]. Accelerated natural selection against aggression in the last 200,000 years coincided with the further feminization of human faces as evidenced by a reduction in brow ridge projections and a shortening of the upper facial region [63, 67, 68]. Given that testosterone is linked to dominance, egoistic choices, aggressive and unfaithful behavior in men, the feminization of male faces may indicate preferential selection for increased social tolerance that allowed humans to work more productively [66, 69, 70].
To the best of our knowledge, we provide for the first time empirical support for the inverse relationship between serum TT in men and their facial attractiveness as scored by young women during their fertile phase. In contrast, Roney et al. reported that women prefer natural faces of men with high saliva testosterone . Their raters scored men’s faces in random days of their menstrual cycle, however, and their fertility window was merely guessed based on the calendar method and estimates of hormonal concentration from data published in other reports. Despite each woman rated faces only once Roney et al. did not provide crucial information on the distribution of ratings across the cycle. Especially, they did not provide how many women were in the fertile days. Also, we studied Caucasian men and women while Roney et al.’s studied samples of different ethnic groups.
We measured serum concentration of TT using a standardized well-validated clinical methodology between 7 a.m. and 9 a.m. because the concentration can fall up to 30-35% in the late afternoon . Roney et al. measured saliva ‘free’ testosterone at various times of the day and regressed transformed testosterone concentrations onto the individual rates of the attractiveness of faces. A simple linear regression may not be the right tool to explore the association of ordinal correlated scores of unknown distribution with testosterone values. Furthermore, they did not explain how regression coefficients and variance from all 75 individual linear regression analyses were pooled.
A more sophisticated statistical approach is needed to model positive correlation among raters’ attractiveness scores of the presented facial images, which could be the result of sequentially dependent attractiveness perception or sequentially dependent response bias [73, 74]. For example, if a rater’s scoring criterion gradually changes over time (e.g., the rater tends to give higher ratings at the beginning of the experiment and lower ratings at the end of the experiment), then the autocorrelation of the rater’s scoring criteria will lead to a positive correlation between current and previous ratings.
Peters et al. also investigated an association between saliva testosterone and attractiveness and masculinity of face and body on photographs of 119 young men, rated by two independent groups of 12 females . Most of these women were taking hormonal contraceptives that could have altered any potential menstrual cycle effects on ratings. Testosterone was not correlated with either attractiveness or masculinity, however, they implied that the correlation between testosterone and attractiveness was more likely to be negative. Also, Neave et al., who used natural photographs of 48 men and 36 female raters, did not find an association between salivary testosterone and attractiveness or masculinity . They did not report, however, at which phase of the menstrual cycle their female raters were during scoring.
We measured the concentration of serum TT, while most investigators rely on measurements of ‘free’ testosterone in saliva [71, 75, 76]. Despite the speed, easy saliva collection, and avoidance of stress of vein puncture, multidisciplinary clinical experts recommend serum TT as the first-line, reliable indicator of the physiological function of gonads, with low analytical variation (precision about 4–10%) and close correlation with calculated bio-testosterone and free testosterone [77, 78]. It is uncertain if saliva testosterone represents the so-called ‘free’ fraction, due to its binding to albumin, proline-rich proteins, and steroid hormones-binding globulin [79, 80]. Hence, there is a poor agreement of saliva testosterone with serum values, even in laboratory-controlled conditions . Substantial analytical errors and biological variability of saliva testosterone should be accounted for when attempting to determine biologically relevant a ‘least significant difference’ threshold [82, 83]. Even in highly controlled conditions the threshold for salivary testosterone is large (78–90%) [84–86].
We allowed for each rater 7 seconds to assess each photograph of men’s face to capture the first impression of attractiveness, which is most likely rapid, automatic, and mandatory . Rates could have reliably judged the attractiveness of faces presented for just 13 ms , but we selected longer time and random display of images to minimize the systematically biased perception of face attractiveness toward faces seen up to several seconds before. Xia et al. demonstrated that perceived face attractiveness was pulled by the attractiveness level of facial images encountered up to 6 s after the previous image . By giving only one task to our raters we avoided multi-task cognitive overload, which could have affected raters’ ability to intentionally form attractiveness impressions, and automaticity of impression formation. An additional bias could have been introduced if raters simultaneously assessed masculinity and man’s attractiveness for a short or long-term partnership as in .
The theory of shifting women’s preferences for facial masculinity across the menstrual cycle is being promoted [23, 25, 33]. We did not assess the masculinity of participants’ faces, but the smaller size of the composite face of our attractive group suggests that women prefer rather more feminized male faces. The theory, however, is based on a premise that there is a significant association between testosterone and masculinity [31–33, 35, 44]. Androgenic influence during puberty likely shapes “high-testosterone” face, which is purported to be an honest indicator of health and male fitness because testosterone enhances sexual signals, but suppresses immune function . A recent meta-analysis found little evidence that testosterone suppressed immune function , whereas another meta-analysis identified an opposite effect - a strong suppressive effect of experimental immune activation on testosterone .
Surprisingly, there are contradictive reports that substantiate associations of testosterone with masculinity [54, 63, 89, 91]. The association remains an issue because of unresolved discrepancies between structural and masculinity ratings and methodological shortcomings of studies that used abstracted computer-manipulated images of ‘high and low testosterone faces’ or saliva testosterone measurements [25, 75, 89, 91, 92]. Recent findings do not support preferences for male masculinity traits either at low- or high-conception probability groups of women [34, 35, 44, 93, 94]. Pound et al.  suggest that raters may attribute ‘masculine’ ratings to faces they find attractive irrespective of the objective sexual dimorphism, due to stereotypical associations between the term ‘masculinity' and attractiveness. Furthermore, perceived masculinity may not correlate with attractiveness [46, 95], especially in view of an association of perception of sexual unfaithfulness with face masculinity .
Females’ preferences for types of male faces could differ among populations due to ethnic, socio-cultural, and human development factors [9, 96]. Recent reports showed women’s preferences for more feminized faces of Caucasian men [9, 10, 34, 97]. Kocnar et al. reported that feminized male faces were preferred over masculinized faces by women in most European populations, especially in countries with high human development index [9, 10]. Perrett et al.  reported preferences for more face feminization of Japanese and Scottish participants compared to Caucasian North Americans faces among the raters, whereas Harris et al. have found the opposite pattern . Studies comparing preferences among populations provide contradictive results, hence, differences in women’s preferences for masculinity or femininity between populations remain an open topic.
The selection of participants of different ages to conduct a research study on physical attraction to the opposite sex could have an impact on the ratings and comparability of results . Two studies of large cross-cultural samples found that males prefer females considerably younger them themselves and females prefer males considerably older than themselves [99, 100]. Most studies are based on samples of male students 18-20 years old, who may still be developing their ultimate secondary sexual characteristics and face masculinity. In a study when the mean age of men was 18, female raters favored faces of men of higher testosterone levels [71, 101]. Male teenagers are less likely to be rated as attractive masculine men by women in their late twenties or older . This could be a factor in studies, where the mean age of female raters was substantially higher than the mean age of men whose computer-modified or even natural faces were used as stimuli [25, 26]. Likewise, female teenagers may give very different rates when assessing masculinity in teenagers' male faces compared to women over 30 years old [89, 101]. Individual variation in evaluations of trustworthiness, dominance, and attractiveness, is largely shaped by people’s personal experiences and the rapidity of their sexual development [98, 102]. Thus, including teenage participants may lead to doubts whether all participants of a study are biologically and psychologically suitable to provide generalizable assessments, especially when taking into account the age disparity in sexual relationships. In our study, we opted for the age-matched sample.
We used natural non-edited images of male faces to explore associations between male’s facial attractiveness and TT. Such an approach links realistic TT measures in men with individual rates of women. Many other studies on preferences of women across the menstrual cycle in context men’s perceived attractiveness or masculinity opted to use computer simulations or heavily edited pictures or constructs of male’s face [23, 25, 34, 91, 92, 101, 103]. Certainly, computer alterations of face images to obtain a certain degree of masculinity or femininity facilitate face ratings and reduce the cost of the study, however, at the same time the differences between composite images of faces can be unnaturally high leading to believe that certain face images are more appealing to women than raw non-edited images of male faces. Ratings of images of real faces may present a more appropriate assessment of women’s preferences than ratings of unrealistic computer-manipulated visuals.
Several studies that use surveys of females to determine the fertile time window reported contradictive results [25, 26, 34]. The timing of women’s fertile window is often unpredictable. Wilcox et al. showed that only in about 30% of women is the fertile window entirely within the days of the menstrual cycle identified by clinical guidelines—that is, between days 10 and 17 . A study, in which daily values of sex hormones in female participants across the cycle were determined, showed that ovulation occurred as late as 7 days before menses, and as early as on day 8th of the cycle . Thus, we were fortunate enough to target accurately the fertile window as female participants were still before ovulation on day 13 of the cycle as indicated by the low concentration of serum progesterone and high concentration of estrogen. Despite we did not determine if the women were fertile, we assumed that healthy, young regularly menstruating women are fertile.
Our sample size of female participants was rather small due to the budget constraints of the project. The sample size is comparable to previous reports, however. In contrast to previous studies, the group of our raters was more homogenous and was studied in the same day of the menstrual cycle to reduce sex hormone related variability. Also, we used a state of the art statistical modeling to further reduce the probability of erroneous inferences. Our preliminary report can help to plan more powerful studies.
In summary, we have demonstrated that young healthy women prefer images of natural faces of young men with lower concentrations of total testosterone in serum on day 13 of the menstrual cycle. The size of the preferred male face tends to be smaller.