We performed retrospective content analysis of residency personal statements submitted during two academic years, 2014-2015 and 2015-2016, for applicants who intended to enter residency during the following academic year. This study was performed at the University of Wisconsin-Madison, USA, within the Pediatrics Department, during the study period of November 2017 through June 2018.
As part of the application to US medical residencies, graduating medical students submit biographical information, medical school and standardized testing transcripts, a Dean’s Letter known as the Medical School Performance Evaluation (MSPE) detailing the student’s medical school performance and extracurricular involvement, letters of recommendation, a personal statement, and an applicant photograph via the Electronic Residency Application System (ERAS). Each residency program then uses its own program-determined criteria to select which applicants to invite for interview. After interviews are conducted, residency programs rank all interviewees using program-specific scoring systems, based on the different elements of the residency application and the interview, and applicants rank the residency programs at which they interviewed. Finally, the National Residency Matching Program (NRMP), utilizes a mathematical algorithm based on applicant and residency program rankings to “match” applicants into residency positions (36).
While other residency programs may use the personal statement in determining which applicants to invite for interview, our program and many others utilize only the most objective measures of residency qualification and clinical skill from the application (medical school transcripts and standardized testing scores) to determine interviewee selection. Our program then reviews the subjective portions (personal statements, MSPE, and letters of recommendation) after the interview to help determine final applicant rank.
All applications from medical students who interviewed with the pediatrics residency program in the two application cycles were eligible for inclusion, except for current residents whose personal statements were excluded. This population was purposefully chosen for this initial study examining linguistic characteristics of pediatric personal statements to help understand gender differences in the writing of the candidates most likely to be interviewed by pediatric residency programs. Current residents were excluded from this study. Exclusion of current residents was based on collaborative decision-making between the study team, pediatrics residency leadership, and the IRB in effort to reduce the risk of identification of any personal statement author by study team members who work with the residents based on written content of their personal statement, even in the absence of unique identifiers. Additionally, the only demographic information provided to the study team was applicant self-reported gender. Options for applicant gender in ERAS included, “Male,” “Female,” “Decline to answer,” and “No answer.” No applicants in the study pool selected, “Decline to answer” or “No answer” for gender, so only male and female comparisons were performed.
We used the text analysis software, Linguistic Inquiry and Word Count (LIWC), previously used in numerous studies evaluating gender biases in medicine and academia (22, 24, 37, 38). LIWC is a linguistic analysis tool that aids in the study of cognitive, emotional, and structural aspects of written and verbal speech (39, 40). The LIWC program searches bodies of text for words and word parts that match its internal dictionary of 6,400 words and word stems. Users may also manually add words to the LIWC program dictionary.
LIWC primarily reports data as percentage of word occurrence in text. For example, when examining references to “reward” in text, LIWC output is the percentage of “reward” words (including, “fulfill,” “promote”, and “benefit”) used within the text. For a subset of variables, output is reported instead as a LIWC-validated scaled score, based on standardized scores derived from large comparison samples of linguistic characteristics found in speech and writing across a variety of settings in the general US population (39, 41-44).
Measures
Linguistic Dimensions, Writing Tone
Linguistic dimensions included word count, average words per sentence, and complex word count (defined by words longer than 6 letters). Overall writing tone of personal statements was determined using four variables based on population samples of text across a variety of settings (39, 41-44). Writing tone variables included: (1) analytic tone (tendency to use formal or informal word choices); (2) emotional tone (tendency to use of positive or negative emotion words); (3) clout (tendency to use language expressing expertise or tentativeness); and (4) authenticity (tendency to use language expressing vulnerability or guardedness), as scored by the LIWC program.
Agentic & Communal Language
Given previous findings of gender differences in agentic and communal language for males and females (7, 9, 34, 35), agentic and communal word use were our primary variables of interest. To evaluate agentic and communal language, we selected LIWC dictionary categories used in prior studies evaluating gender differences in residency applications and other social and professional contexts (9, 10, 22, 24, 37, 38). We also manually added dictionaries for agentic and communal language, described previously in content analysis by Madera, et al. and utilized by Li, et al in evaluation of letters of evaluation for emergency medicine (24, 37). Agentic words included “fulfill,” “benefit,” “success,” “achieve,” “think,” “know,” and “confident.” Communal words included “family,” “friend,” “perhaps,” “maybe,” “kind,” and “helpful.” Table 1 includes the list of example dictionary words for each category.
Table 1. Dependent Variables Examined Using Linguistic Inquiry Word Count (LIWC)(39)
Category
|
Example LIWC dictionary words
|
Agentic Language
LIWC dictionary
Reward
Risk
Power
Achievement
Insight
Certainty
Madera et al dictionary
(manually added)
|
Fulfill, benefit, opportunity
Danger, doubt
Superior, bully
Success, better
Think, know
Always, never
Assertive, confident, ambitious, aggressive, dominant, forceful, independent, daring, outspoken, intellectual, earn, gain, do
|
Communal language
LIWC dictionary
Social affiliation
Family
Friends
Male references
Female references
Tentativeness
Madera et al. dictionary
(manually added)
|
Family, friend, children
Daughter, father, brother
Friend, neighbor
Boy, his, dad
Girl, her, mom
Perhaps, maybe
Affectionate, nurturing, helpful, kind, sympathetic, sensitive, agreeable, tactful, interpersonal, warm, caring, tactful, husband, wife, babies, kids, colleagues, they, him, her
|
Procedure
All 85 male-authored personal statements that met eligibility criteria for our study were included. In order to avoid over-sampling females (which represented over 75% of the interviewee pool), female-authored personal statements were randomly selected using random number generator to obtain an equal number of personal statements, resulting in 170 personal statements total (85 from males, 85 from females) that were then provided to the study team for analysis in de-identified form.
In the first phase of content analysis, we assessed proper LIWC word categorization for pediatric residency personal statements. Using a sample of 10 personal statements, two authors (JB, AG) independently evaluated LIWC output variables to find words that might commonly occur in personal statements but be categorized inappropriately by LIWC. For instance, “Children’s” (e.g. “Children’s Hospital”) was frequently used as a proper noun and if not excluded in this context, would have been counted inappropriately as a word of social affiliation. Similarly, the words “practice,” “admitted,” and “down” (as in Down Syndrome) have different meaning in medicine as compared to their categorization in LIWC. An investigator (AG) reviewed use of these words in all 170 personal statements in order to ensure appropriate word categorization based on context.
Analysis
Next, we entered all personal statement texts into the LIWC program and analyzed them using the LIWC dictionary variables for linguistic dimensions; writing tone; agentic language; and communal language. We then analyzed the personal statements using the manually entered dictionaries based on the work of Madera, et al. for agentic and communal language (37). For each LIWC output variable, two-tailed t-tests (p<0.05) were performed using STATA (STATA Software 15.0) (45) to compare mean percentage of word use and mean LIWC-validated scaled score for male- and female-authored personal statements.
IRB
The Education and Social/Behavioral Science Institutional Review Board at the University of Wisconsin-Madison deemed this study non-human subjects research (2017-1422) and therefore exempt from its review, given that the personal statements for this study were provided to the study team for analysis without identifiers, and the identities of the writers could not be ascertained.