Recently, while catching up on some of my magazine reading, I noticed something surprising: without looking at the byline, I was able to guess whether or not the author of any given article was a woman or a man with better than 80% accuracy. Better still, I could usually pull this off within the first four paragraphs of any given piece.
How was I able to do that? I wondered. What was tipping me off? And what kind of research have people done on this topic—detecting gender in writing?
Before I dive any deeper, let’s get something out of the way. Despite any of the observations, speculation, and research findings I’m about to summarize, my measured conclusion is that the ability to detect gender in writing is…suspect. I’ll explain why toward the end of this post. But until then, enjoy this short ride through some previous research and opinion.
One more thing: I have not taken a scholarly approach to this piece (by including references, etc.), but if any reader wants to be pointed toward any of the resources I reviewed, send me a note and I’ll try to accommodate.
My instinct on why and how I was making much-better-than-chance correct gender guesses was based on style: although I could not put my finger on anything specific, there must have been something about the sentence structures or the vocabulary choices that ticked a box on a subconscious level. Perhaps the female and male writers were using metaphor in slightly different but noticeable ways; perhaps one gender was more likely than another to use a particular rhythm or syntax; possibly one gender was more or less rigid about sentence length, or followed a more recognizable pattern when layering clauses and conditionals, or leaned to the use of some type of punctuation.
There was nothing I could easily identify, yet there had to be something there.
Or had there? You see, the reason I noticed that I could identify gender was not because I was looking for it across a broad range of writing. Rather, it was because I was reading extensively through several months of back issues of a particular publication. It was entirely possible that I had begun to detect (again, subconsciously) the individual styles of regular contributors, not any specific identifiers related to gender. I may have begun to sense that an article had been written by Emily or Lisa or Carolyn or Tina, rather than by Bruce or Jonathan, not because of any gender flags but because of each of those author’s personal flags. It was also true that the odds were stacked by the sample: a quick check showed that the writing staff at this publication, during the window I was reading through, was about 80% female.
I don’t have the time and resources to make a reliable study of this, but I quickly began to recognize what would be involved in that kind of work: researchers would need to collect writing samples from both men and women, preferably across multiple age and education categories. Those samples would have to be generated on controlled topics, under controlled conditions, with the authors given specific instructions regarding attributes such as the target audience and the level of formality required. And yet, those authors would have to be kept in the dark about the purpose of the study, otherwise every sentence they wrote might be contaminated. A proper study would require hundreds, and more likely thousands, of samples, which would then need to be meticulously analyzed for a wide array of factors: vocabulary, word length, sentence length, reading level—just to begin with. Measuring things like the use of simile and metaphor would be a challenge (a very rough stand-in might be the frequency of certain types of words). Doing this right would not be easy.
I have not delved exhaustively into the research but the studies I’ve reviewed seem to prefer a different—and possibly just as valid—approach. Instead of generating samples for the study, they look at existing material by authors of known gender, build a set of criteria that distinguish male from female authors, then apply those criteria to previously unanalyzed material to see how well their gender guessing succeeds. There’s a lot of data crunching involved, but it seems like a reasonable method (if open to some criticism).
Here’s a summary of one such effort, which analyzed 20 million words and created a list of 128 “significant contrasts.” Those researchers claim the program succeeded at identifying gender 80% of the time, with the primary giveaways being the use of determiners, followed by numbers, modifiers, pronouns, and what the researchers somewhat nebulously labelled “a more interactive style.”
Those are interesting conclusions and deserve both attention and more effort. It’s also worth noting that 20 million words is the text equivalent of around 200 very short novels (or around 35 works the size of “Infinite Jest”). That might be statistically representative, but possibly not: all publishing channels in the US alone produce at least 600,000 new works each year. You’d probably want to analyze samples across time as well. Other research has looked at a much larger quantity of source material, and also sampled from older work, to set their baselines.
Studies attempting to tackle the gendered language issue have sometimes reached conflicting conclusions. Work done between the 1970’s and the 2000s has found all sorts of gender differences in language use: men use more directives, women more questions; women use more words per turn, but men dominate conversations and use more words overall; women use more varied verb forms, more adverbs, and more pronouns; while men use longer words and more articles. Yet other studies have failed to replicate some of these results, or even reached opposite conclusions. I’ll reiterate that there are a lot of interesting tidbits in many of these studies, but drawing firm conclusions from any of them is tricky. Much more informed investigators than I have pointed out that too many of these studies have used very small participant sample sizes or, for reasons of effort, restricted themselves to small subsets of spoken or written language use.
Suffice to say that there is not a lot of clarity or consensus on this topic. One phrase that I read in a particular study (Newman et al.) seems apropos: “The overall picture…is of a multitude of differences combined with a good deal of overlap between the language of men and women.” The authors were probably aware of the irony of noting this shortly after a discussion of the use of language “hedges” by research subjects.
One of the most interesting things I came across while poking into this topic is a “Gender Guesser” tool, based on an older tool which was in turn built on research which asserted that gender could be reliably determined using statistical analysis. The tool makes no claim to be foolproof, acknowledging (among numerous disclaimers) that it probably gets it right 60-70% of the time.
Still, that’s good enough to have a little fun. The results, though, were enough to give me serious misgivings about the base research.
I ran quite a few samples through this tool, from several genres: straight journalism, technical journalism, adult fiction, young adult fiction, and a few that are more difficult to categorize. While my samples were not pulled with any rigorous attempt at random selection, I made an effort to select widely. Even so, I had great difficulty getting even a single piece to register as unequivocally female.
Numerous samples, by both men and women, rated as “Weak Male,” with the amusing notation “Weak emphasis could indicate European.” The “European” distinction is not clearly explained (a note offers some handwaving about “weak emphasis” in European English vs. American English).
All of the magazine samples referenced at the start of this post rated “Male” (very occasionally “Weak Male”) and that held true with journalistic samples from other sources, regardless of the author’s actual gender. Fiction—and I chose broadly, with an admitted bias toward sampling more women than men—was about the same.
Oddly enough, two excerpts of Hemingway that I ran through the gender guesser came up “Weak Male” and “Weak Female” (of all the samples I tried, that was the only excerpt that scored “Weak Female”). I find this more than a little amusing, as Hemingway is frequently cited as one of the most macho writers of the 20th century. If I had a dollar for every time I’ve heard someone refer to his work as “masculine prose” I would at least be set for coffee this week.
Not a single writer rated as simply “Female.” That is, until I took a dive into one specific genre. When I pulled an excerpt from a book typically categorized as “Women’s Literature” (less flattering: “Romance”), the guesser determined that as formal writing it would be female. Interestingly enough, it was still solidly male (not weak) when viewed as informal writing.
While feeding samples into the analyzer, I remembered that I had access to some material which might introduce a new wrinkle. I’ve been involved with a small project that has had writers attempt to fool one another by writing in ways that would make their identity unclear. Part of this includes writers of one gender attempting to produce material that appeared to have been written by someone of the opposite gender. While I only had about half a dozen of those samples available, it’s interesting that every one of them—whether by a man writing as a woman or a woman writing as a man—rated as “Weak Male.” Considering that nearly every published sample by a woman that I fed into the analyzer kicked out a result of Male or Weak Male, I think the effort succeeded. But I’m not really sure. There’s too much uncertainty here.
Now, returning to my earlier assertion, just why do I feel that attempts to detect gender (and any claims of success) are “suspect?”
If we accept that gender is a social construct (separate from “sex”), which evolves over time, then any markers of gender in writing must evolve as well. A marker that leans “male” at one moment in time can lean “female” at another (see the Hemingway example above). Even when (or if) a gender-guessing method works effectively, it will probably only be valid for a short period of time and across a limited pool (genres) of writing. If gender is social and cultural, writing is even more so: combining the two and expecting consistency would be too much to ask. Reliable gender markers that identify a woman in 1975 can’t be trusted in 1925 or in 2005.
I didn’t approach it scientifically, but hints of this appeared in my samples: the more recent the sample thrown at the guesser, the more likely it seemed to rate strongly male—regardless of the author’s actual gender.
Markers could be determined and calibrated for many points in time—and additional genders could be delineated and boundaries adjusted. But I think the question can be legitimately raised: Would this gender information have value?
I don’t know the answer to that question. To me, there is already value in these efforts, but it’s not in trying to pin down gender. Research of this type often seems to uncover interesting things about language use in general, whether or not it tells us anything about gender. The efforts the best of these studies go through to categorize different aspects of language are potentially much more important than their gender classification efforts. The granularity achieved in picking apart types of words and tallying their uses, across massive amounts of text, seems more likely to lead to deep and useful insights than fleeting determinations of perceived gender.
Throughout this post I have violated a suggestion I made (and still recommend) some time back: that writers should use “woman” over “female” in most situations. Because the topic here approaches the clinical, it seemed more organic and better style to use “male” and “female” in many places. You’ll note that when I did this the terms were always paired: I did not refer to “men” as “men” but then demote “women” to “females.” The terms always have man/woman or male/female symmetry. In that sense, I have kept to both the general recommendation and the specific one made in that post.