Do you write “like a girl?”

Recently, while catching up on some of my magazine reading, I noticed something surprising: without looking at the byline, I was able to guess whether or not the author of any given article was a woman or a man with better than 80% accuracy. Better still, I could usually pull this off within the first four paragraphs of any given piece.

How was I able to do that? I wondered. What was tipping me off? And what kind of research have people done on this topic—detecting gender in writing?

Before I dive any deeper, let’s get something out of the way. Despite any of the observations, speculation, and research findings I’m about to summarize, my measured conclusion is that the ability to detect gender in writing is…suspect. I’ll explain why toward the end of this post. But until then, enjoy this short ride through some previous research and opinion.

One more thing: I have not taken a scholarly approach to this piece (by including references, etc.), but if any reader wants to be pointed toward any of the resources I reviewed, send me a note and I’ll try to accommodate.

My instinct on why and how I was making much-better-than-chance correct gender guesses was based on style: although I could not put my finger on anything specific, there must have been something about the sentence structures or the vocabulary choices that ticked a box on a subconscious level. Perhaps the female and male writers were using metaphor in slightly different but noticeable ways; perhaps one gender was more likely than another to use a particular rhythm or syntax; possibly one gender was more or less rigid about sentence length, or followed a more recognizable pattern when layering clauses and conditionals, or leaned to the use of some type of punctuation.

There was nothing I could easily identify, yet there had to be something there.

Or had there? You see, the reason I noticed that I could identify gender was not because I was looking for it across a broad range of writing. Rather, it was because I was reading extensively through several months of back issues of a particular publication. It was entirely possible that I had begun to detect (again, subconsciously) the individual styles of regular contributors, not any specific identifiers related to gender. I may have begun to sense that an article had been written by Emily or Lisa or Carolyn or Tina, rather than by Bruce or Jonathan, not because of any gender flags but because of each of those author’s personal flags. It was also true that the odds were stacked by the sample: a quick check showed that the writing staff at this publication, during the window I was reading through, was about 80% female.

I don’t have the time and resources to make a reliable study of this, but I quickly began to recognize what would be involved in that kind of work: researchers would need to collect writing samples from both men and women, preferably across multiple age and education categories. Those samples would have to be generated on controlled topics, under controlled conditions, with the authors given specific instructions regarding attributes such as the target audience and the level of formality required. And yet, those authors would have to be kept in the dark about the purpose of the study, otherwise every sentence they wrote might be contaminated. A proper study would require hundreds, and more likely thousands, of samples, which would then need to be meticulously analyzed for a wide array of factors: vocabulary, word length, sentence length, reading level—just to begin with. Measuring things like the use of simile and metaphor would be a challenge (a very rough stand-in might be the frequency of certain types of words). Doing this right would not be easy.

I have not delved exhaustively into the research but the studies I’ve reviewed seem to prefer a different—and possibly just as valid—approach. Instead of generating samples for the study, they look at existing material by authors of known gender, build a set of criteria that distinguish male from female authors, then apply those criteria to previously unanalyzed material to see how well their gender guessing succeeds. There’s a lot of data crunching involved, but it seems like a reasonable method (if open to some criticism).

Here’s a summary of one such effort, which analyzed 20 million words and created a list of 128 “significant contrasts.” Those researchers claim the program succeeded at identifying gender 80% of the time, with the primary giveaways being the use of determiners, followed by numbers, modifiers, pronouns, and what the researchers somewhat nebulously labelled “a more interactive style.”

Those are interesting conclusions and deserve both attention and more effort. It’s also worth noting that 20 million words is the text equivalent of around 200 very short novels (or around 35 works the size of “Infinite Jest”). That might be statistically representative, but possibly not: all publishing channels in the US alone produce at least 600,000 new works each year. You’d probably want to analyze samples across time as well. Other research has looked at a much larger quantity of source material, and also sampled from older work, to set their baselines.

Studies attempting to tackle the gendered language issue have sometimes reached conflicting conclusions. Work done between the 1970’s and the 2000s has found all sorts of gender differences in language use: men use more directives, women more questions; women use more words per turn, but men dominate conversations and use more words overall; women use more varied verb forms, more adverbs, and more pronouns; while men use longer words and more articles. Yet other studies have failed to replicate some of these results, or even reached opposite conclusions. I’ll reiterate that there are a lot of interesting tidbits in many of these studies, but drawing firm conclusions from any of them is tricky. Much more informed investigators than I have pointed out that too many of these studies have used very small participant sample sizes or, for reasons of effort, restricted themselves to small subsets of spoken or written language use.

Suffice to say that there is not a lot of clarity or consensus on this topic. One phrase that I read in a particular study (Newman et al.) seems apropos: “The overall picture…is of a multitude of differences combined with a good deal of overlap between the language of men and women.” The authors were probably aware of the irony of noting this shortly after a discussion of the use of language “hedges” by research subjects.

One of the most interesting things I came across while poking into this topic is a “Gender Guesser” tool, based on an older tool which was in turn built on research which asserted that gender could be reliably determined using statistical analysis. The tool makes no claim to be foolproof, acknowledging (among numerous disclaimers) that it probably gets it right 60-70% of the time.

Still, that’s good enough to have a little fun. The results, though, were enough to give me serious misgivings about the base research.

I ran quite a few samples through this tool, from several genres: straight journalism, technical journalism, adult fiction, young adult fiction, and a few that are more difficult to categorize. While my samples were not pulled with any rigorous attempt at random selection, I made an effort to select widely. Even so, I had great difficulty getting even a single piece to register as unequivocally female.

Numerous samples, by both men and women, rated as “Weak Male,” with the amusing notation “Weak emphasis could indicate European.” The “European” distinction is not clearly explained (a note offers some handwaving about “weak emphasis” in European English vs. American English).

All of the magazine samples referenced at the start of this post rated “Male” (very occasionally “Weak Male”) and that held true with journalistic samples from other sources, regardless of the author’s actual gender. Fiction—and I chose broadly, with an admitted bias toward sampling more women than men—was about the same.

Oddly enough, two excerpts of Hemingway that I ran through the gender guesser came up “Weak Male” and “Weak Female” (of all the samples I tried, that was the only excerpt that scored “Weak Female”). I find this more than a little amusing, as Hemingway is frequently cited as one of the most macho writers of the 20th century. If I had a dollar for every time I’ve heard someone refer to his work as “masculine prose” I would at least be set for coffee this week.

Not a single writer rated as simply “Female.” That is, until I took a dive into one specific genre. When I pulled an excerpt from a book typically categorized as “Women’s Literature” (less flattering: “Romance”), the guesser determined that as formal writing it would be female. Interestingly enough, it was still solidly male (not weak) when viewed as informal writing.

While feeding samples into the analyzer, I remembered that I had access to some material which might introduce a new wrinkle. I’ve been involved with a small project that has had writers attempt to fool one another by writing in ways that would make their identity unclear. Part of this includes writers of one gender attempting to produce material that appeared to have been written by someone of the opposite gender. While I only had about half a dozen of those samples available, it’s interesting that every one of them—whether by a man writing as a woman or a woman writing as a man—rated as “Weak Male.” Considering that nearly every published sample by a woman that I fed into the analyzer kicked out a result of Male or Weak Male, I think the effort succeeded. But I’m not really sure. There’s too much uncertainty here.

Now, returning to my earlier assertion, just why do I feel that attempts to detect gender (and any claims of success) are “suspect?”

If we accept that gender is a social construct (separate from “sex”), which evolves over time, then any markers of gender in writing must evolve as well. A marker that leans “male” at one moment in time can lean “female” at another (see the Hemingway example above). Even when (or if) a gender-guessing method works effectively, it will probably only be valid for a short period of time and across a limited pool (genres) of writing. If gender is social and cultural, writing is even more so: combining the two and expecting consistency would be too much to ask. Reliable gender markers that identify a woman in 1975 can’t be trusted in 1925 or in 2005.

I didn’t approach it scientifically, but hints of this appeared in my samples: the more recent the sample thrown at the guesser, the more likely it seemed to rate strongly male—regardless of the author’s actual gender.

Markers could be determined and calibrated for many points in time—and additional genders could be delineated and boundaries adjusted. But I think the question can be legitimately raised: Would this gender information have value?

I don’t know the answer to that question. To me, there is already value in these efforts, but it’s not in trying to pin down gender. Research of this type often seems to uncover interesting things about language use in general, whether or not it tells us anything about gender. The efforts the best of these studies go through to categorize different aspects of language are potentially much more important than their gender classification efforts. The granularity achieved in picking apart types of words and tallying their uses, across massive amounts of text, seems more likely to lead to deep and useful insights than fleeting determinations of perceived gender.



Throughout this post I have violated a suggestion I made (and still recommend) some time back: that writers should use “woman” over “female” in most situations. Because the topic here approaches the clinical, it seemed more organic and better style to use “male” and “female” in many places. You’ll note that when I did this the terms were always paired: I did not refer to “men” as “men” but then demote “women” to “females.” The terms always have man/woman or male/female symmetry. In that sense, I have kept to both the general recommendation and the specific one made in that post.


Posted in Culture, Language, Things you should know, Words, Writing | Tagged , , , , , , , | Leave a comment

Should You Spell Out Numbers In Legal Documents?

I have mentioned in the past that some of my editorial and writing work has included documents for law firms or related legal applications. Legal filings are beyond my expertise, but I’ve done a lot of work for use on law firm sites, usually targeted to other lawyers or to potential clients.

Competent writing of that sort often requires reading up on a topic and that research brings me into contact with filings and rulings, as well as many other legal documents. Legal writing can be dry; it can be boring. At its worst, it can be downright terrible: not merely overblown and tortured, but also virtually unintelligible even to a knowledgeable and interested reader.

On the other hand, good legal writing can be both interesting and entertaining (as Judge Posner and others have repeatedly shown).

One of the bad things that I only rarely have to suffer through in legal writing is the completely unnecessary spelling out of numbers. Few things in professional writing make me truly angry, but this is one of them.

I’m not talking about spelling out “six” for “6” or even the occasional “twenty-five” for “25” (the first of my examples would be recommended by virtually every accepted style; the second would be acceptable to a large minority). Oh, no: I’m talking about documents in which a year, such as 1997, or a dollar amount, such as $3,150,621, is spelled out: One thousand, nine-hundred, ninety-seven; three million, one hundred fifty-thousand, six-hundred, and twenty-one dollars (…gratuitous commas and “and” added to stress the point).

This kind of thing is just awful and there is no excuse for it. This is not recommended (or even condoned) practice in any major style. Not even the legal ones, such as Bluebook or Redbook. Bryan Garner, the chief motivator behind The Redbook, gives essentially the same advice there that he does in the more general-audience Garner’s Modern English Usage:

“The best practice is to spell out all numbers ten and below and to use numerals for 11 and above.”

Major style guides—and in-house styles—generally agree, but might differ in the details, spelling out numbers through 12, or 20, or 100. But the principle is the same. You’ll be hard-pressed to find an authority that dictates spelling out “three thousand four hundred seventy-seven.” (There are a handful of exceptions in every style, such as spelling out numbers when they start a sentence. I won’t try to categorize the exceptions, just follow whatever style is being used on your document.)

In any writing, spelling out large numbers instead of using numerals is annoying. In legal writing, it’s worse. Getting it onto the page correctly in the first place requires extra effort, and then it demands extra proofing. The possibility of error increases at every stage. Beyond that, from a reader’s point of view this approach makes a document harder to read. If only a small number of uses are involved, it might not be too much of a problem. But the more that this sort of thing is done, the more trouble it can cause.

I’m bringing all this up as prelude to a document of staggering opacity that I recently had the misfortune to review. The document defines the physical limits of a parcel of land subject to a proposed city zoning change.

It’s included here, with most of the identifying information stripped out. This is less a strictly legal document than a government one, but the idea is exactly the same: someone felt the need to spell out the large numbers used to describe a lengthy series of property boundary measurements, and the result is an abomination.

Take a look, but don’t feel pressured to study the entire thing.

Not only is this verbal description of the affected area extremely difficult (and tedious) to read but, having had to read this document and compare the described measurements to the map of same, I submit that there is an error in the text description. (I don’t think this error can be discovered without walking through the text in tandem with the map, so don’t exert any effort trying to find it.)

Let’s not even talk about the capitalization; or why the author of this horror felt that distances need to be spelled out, but not compass directions; or why the area (square footage) at the end of the list is not spelled out.

I had trouble finding any reputable source—and I want to be very clear on this: any reputable source—that requires, specifies, encourages, justifies, or even tolerates this practice. Even in this manual (my linked boundary document was produced in Massachusetts, so I’ll assume it follows Mass rules), there’s no reference to this kind of thing. I probably shouldn’t take that as gospel, though: the ‘General Laws of Massachusetts’ is a document that’s an egregious offender when it comes to spelling out numbers instead of using numerals. Compare these two sample pages to see if you can uncover any guiding principle in the good old “M.G.L.”

If any reader has information to show that some style or actual law dictates this practice, please share it.

From what my research (as always, limited) reveals, the origin of the practice of spelling out all numbers, or spelling them and then including the numerals in parentheses, is lost to history.

Some say that it is purely an archaic practice of writing out numbers: it was done by people who had limited literacy and even less numeracy. It’s my working contention that this theory makes sense only if the practice dates back both to a time and a place when literacy was very low, so that writing, including the writing of numbers, was unusual and was only done for important documents; and, it would likely have to date to a time before the widespread adoption of Arabic numerals. This would require that the tradition began no later than the 16th century.

I’m not explicitly debunking this idea. It would need to meet some conditions, though. Do I think it could meet them? Yes. The question would not be why the practice started, but why it has lingered on so long.

Others suggest that it is (or was) a long-standing anti-fraud measure: numerals alone would be relatively easy to modify in a document without great risk of detection, but altering the numerals and the spelled-out version would introduce a greater risk of the fraud being easily spotted (by causing a problem with the document’s original word spacing and sizing, for example). As proof of this, commenters often cite the ‘fact’ or ‘law’ that when there is a discrepancy between the words and the numerals on a bank check, the words win.

Part of this is indeed true. Something called the Uniform Commercial Code governs the form and validity of transactions such as bank checks, and it specifies that spelled numbers take precedence over numerals when there is a disagreement. This isn’t a Federal law, but has been adopted in very similar form by every state.

Maybe that’s why this practice exists: some people have adopted it as a general anti-fraud measure. And others, whether they’re aware of the anti-fraud idea or not, have copied that form, believing that it was a necessary or recommended style to use in important documents. Both could be true, whether or not it’s effective as an anti-fraud measure, and whether or not it makes any document seem “more official.”

Spelling out the numbers on a check is still fairly common practice. Do you do it? I do, but not because I think of it as an anti-fraud measure: it’s just the way I was taught to write checks, and I’ve never changed it.

This blog is frequently about recommendations for best practice, so let’s wrap up with one.

When it comes to advice on what to do when writing numbers in other documents—even legal documents—this isn’t a hard call. Don’t spell out large numbers. It’s extra work for you, extra work for your readers, and it makes it more likely that errors will slip into your documents. Keep it simple and use numerals whenever possible (as directed by the style guide you’re using).

(Minor edit, 2 August 2021.)

Posted in Things you should know, Writing | Tagged , , , , , , , | 2 Comments

Mentor, mentee — how about mentoree?

One of the positives (or is it a negative?) of working with language so much of the time is the sheer volume of unusual things you encounter. Not simply an odd word or usage here and there, but also the debates people have over those same things—as well as the feelings they express about them.

A term that I’ve encountered a handful of times in that last way is “mentee” as the companion term to “mentor.” It’s used to identify the one who is mentored.

A fair number of people don’t like this term (and some really don’t like it). In full disclosure, I’ve never been a fan. On the other hand, I don’t encounter it a lot and I don’t feel very strongly about it. I believe there are equally good or better words in most cases, and I’ve long been nagged by a feeling that there’s something ‘wrong’ about the word, but I never had a reason to look into it or to care very much.

Mentee popped up on my radar a couple weeks back when I ran across it in an online news story (something from CNN, I believe, but I didn’t record the source). Reading it, my thought was “oh…has ‘mentee‘ been mainstreamed enough that it’s acceptable in online news?” Asking the question already gives the answer.

Encountering the word in an online news source is one thing, but what about more conservative outlets? Has it been embraced by “the paper of record” or others that are typically slower to accept language change? It was time to find out. First, a few words on mentee.

Mentee is a moderately maligned word used to identify a pupil or student. Other synonyms, near and far, include advisee, acolyte, apprentice, and protégé, to name a few. In most usage, a mentee is paired, explicitly or implicitly, with a mentor. The word “mentor” itself is relatively modern, only taking on this use in the early 1700s after the popularity of a 1699 French novel which highlighted Mentor, a character from the Odyssey. Mentor, as a teacher, was adopted into French, English, and German within a few decades, and into Italian and Spanish within a few more. The companion to mentor, mentee, was not documented in a print use for another 200 years.

Major dictionaries and similar sources agree on these basic facts. GMAU notes that “The main oddity about the [mentor/mentee] pair is that unlike most pairs ending in -or and -ee, these are not from a verb stem….There is no verb *to ment.” GMAU goes on to suggest that the pair can be included among vogue words; “vogue words” is one of the more judgmental and dismissive entries in GMAU and is not worthy of the work’s usual high editorial standards.

Moving on…

For those of you preparing to march into battle to beat back mentee, consider what you’re up against: it was a trivial matter for me to find occasional uses of it in the New York Times back to 2010, with rare uses to at least 1995. That turned out to be an ironic secondary citation: it was a letter complaining about an appearance of the word in the Times a week earlier. The correspondent did not approve of William Safire’s use of mentee—but was unaware of the trap he’d stumbled into.

A little more digging found that Safire had been taken to task for using this word before (in 1991). Even at that earlier date, he admitted that he’d used mentee with the hope of receiving complaints (he did), so any later uses, such as the one that received the 1995 complaint, were surely made with provocation in mind. Safire traced the first print use of the word to 1978. The last two paragraphs of that 1991 column show Safire’s reasoning on why the word should be accepted. Safire had used the word in print at least as far back as 1980 with no direct comment on its form: he spent most of that particular column on the verbification of nouns.

An interesting aspect of one of the complaint letters is that it makes the same point as GMAU: mentee is an aberrant -ee form because it doesn’t follow the pattern of being formed from a root verb. (Garner only points out the lack of a verb root; the Times correspondent scolds over it. It’s a good example of what the old descriptivist/prescriptivist debate looks like.) Garner’s thoughts on this are clear and useful, and reading it alerted me to exactly why I’ve always felt there was something off about mentee.

The classical greek origin (“Mentor” as a name) aside, my personal lack of enthusiasm for mentee has had more to do with what I perceive as a certain inherent incompleteness or roughness in the word. It seems that mentee would work better as “mentoree.” That might also satisfy at least some of those who are obstinate on the “Mentor as a name, not a verb” point: it would still not be derived from a verb, but the origin would be more clear.

I’m not alone in this line of thinking. A quick search uncovers a substantial number of uses of mentoree and possibly a few champions. It is clearly the less popular form, though. When it comes to sussing out origins and first use, the best I could uncover for mentoree was a use from the long-ago realm of 1972. The term seemed to have a brief period of popularity in the 1980s and early 1990s, and it’s been continuously in use somewhere, but it’s never gained traction. It turns up at a steady but very infrequent level of use—about 1% of the frequency of mentee, which is not exactly common itself. I couldn’t find mentoree in any mainstream dictionary.

Most dictionaries claim a first use for mentee from 1965. A few sketchier online sources cite a source from 1958, but that doesn’t seem reliable. The use of both words (mentee, mentoree) was vanishingly small before the 1980s. Use of mentee has spiked by a factor of 16 since 1990 (it’s still uncommon).

If you’re not already used to mentee, it’s probably more than time that you got accustomed to it. It doesn’t seem like a word that’s going away.


For some provocative thoughts on the politics and ideology of Mentors, mentors, and mentees, take a look at this.

Posted in Language, Words | Tagged , , , , , , , , , , , , | 2 Comments

On Behalf of Jamais Vu

Do you ever have an experience where, without warning, something that you’re familiar with feels suddenly like you’re encountering it for the very first time?

There’s actually a term for this: it’s jamais vu, the opposite of déjà vu. Instead of the feeling of having seen something before (when you never have—that’s déjà vu), jamais vu is the feeling that you’re encountering something only for the very first time (when you know that, in fact, you have encountered that thing many times).

Déjà vu is a relatively new term in English, first borrowed from the French around 1903 according to sources. I’ve come across a few fuzzy references to suggest that jamais vu has been around for about the same amount of time, but it doesn’t appear in any dictionary I’ve consulted. My searches turned up some French uses early in the 20th century, but I can’t confirm a strictly English use prior to 2000.

Every once in a great while I’ll have this jamais vu experience with a word or phrase. This is not terribly surprising, as some of the few experiments involving jamais vu involve the use of rapid word repetition to trigger it. Note this one especially.*

It most recently happened to me with the word “behalf.” I had typed the word out several times in the course of editing a document, and somewhere around the fourth or fifth use I looked at it and thought behalf…that’s not really a word, is it?”

That’s a very strange experience to have, where one goes from automatically using a string of characters with a straightforward meaning to questioning whether or not you just made the word up and that it means absolutely nothing. It’s especially disconcerting when you can’t immediately dispel the feeling, which happened to me in this case. After pondering the last appearance in the document, then deleting and re-typing it several times, I was only able to shake the feeling by looking behalf up in an online dictionary to prove that it existed (and that I was using it—and spelling it—correctly). It was a very strange few moments indeed.

Jamais vu aside, behalf still strikes me as one of the odder words in English. In fact, it is. It’s one of those linguistic relics that dates back a very long way, but has only an extremely narrow and limited existence in current English: it’s only used as part of the phrase “on behalf of” (rarely “in behalf of”) and occasionally in the construction “on <some entity’s> behalf.”

According to the OED, the noun behalf was originally a phrase (“be healfe”) meaning “by the side” but then fell into use mostly as a preposition—the same way we might use around or near or…before, behind, below, beneath, beside, between, or beyond. It turns out that those “be-“ prepositions have similar origins to behalf. Some of their meanings have wobbled or changed over the centuries before settling into their current definitions, but they all began as (and for the most part still function as) words that indicate relative position. The verb begin, which slipped into the previous sentence (began) almost without me noticing it, shares some roots with these others in the murky mists of etymology.

Behalf has been documented since 1300 or so, but in the past two centuries it has been rarely used except in the sense discussed here. It once had a plural form (behalfs), but that fell away with the word’s older uses.

The etymology above is summarized from the OED, but Merriam-Webster has a note regarding on behalf vs in behalf that I found to be of interest: “A body of opinion favors in with the ‘interest, benefit’ sense of behalf and on with the ‘support, defense’ sense. This distinction has been observed by some writers but overall has never had a sound basis in actual usage.” Note that for the “usage superstitions” file.

When someone acts on behalf of someone else, it means that they speak for the benefit of or in the interest of or to intercede for that other party. Acting on your own behalf…well, that meaning is self-evident.

My instinct would be that you’ll most commonly encounter behalf near and in the legal system: in filings, transcripts, summaries of proceedings, and so forth. It also gets rolled out a lot in what I think of as public ceremonial settings: when someone makes a statement or takes an action on behalf of a larger group. “I thank you on behalf of all residents of the city,” or “The entire class of 2021 appreciates your efforts on our behalf.”

Not much more needs to be said on behalf of behalf. My jamais vu experience aside, it’s a real word, it’s an old word, and it’s a word used only in a few very specific ways.


*And if you read that article title without catching the word duplication then you’ve got some work to do before taking a job as a copyeditor or proofreader!

Posted in Language, Words | Tagged , , , , , , , , | Leave a comment

Shibboleth — not someone who predicts the future and not a Lovecraft monster

I’ve been swimming into the depths of political speech again lately, and that means I encounter interesting words and concepts that aren’t always common in everyday use. One of the words that crops up now and then is “shibboleth.”

I can’t confirm that I’ve ever had reason to put this word into print before today (and it’s certainly not one I use in casual conversation). That’s not a huge surprise: Oxford rates this word as appearing somewhere in the range of once in every 1 million to 10 million words (Band 4; Oxford bases its frequency score in turn on the Google Ngrams corpus since 1970 ).

Despite the unusual (and almost alien) nature of the word to an English speaker, it’s been around for a long time and describes a useful linguistic and social concept.

A shibboleth, narrowly, is a kind of linguistic password. It’s something that reveals something about the speaker’s identity. In the strict traditional sense, it’s used by someone in one group to recognize a person in another group.

Being no biblical scholar, I’ll accept the Encyclopedia Britannica‘s estimate that the Book of Judges, the part of the Bible where the word is first recorded, dates to around 550 BCE. Oxford dates the first use of shibboleth in English to a bible from 1382, with the word migrating into more figurative use by the first half of the 17th century.

The biblical origin of the word is pretty nasty when judged by modern standards: it recounts (some might say glorifies) an occasion when one tribe killed thousands of defeated enemy soldiers who were attempting to surreptitiously escape across the border after identifying them based on their pronunciation of a single word (“shibboleth”). According to the story, people of the two warring tribes pronounced the word differently, with those of one unable to make the “sh” sound and defaulting to “sibboleth.” The incident is in chapter 12 of the Book of Judges (start with chapter 11 for the whole story).

Historically, the idea appears to have been used repeatedly—and often with unpleasant results. One Wikipedia entry includes a list of known and believed incidents from history.

Less strictly, shibboleth is used to denote some verbal or behavioral indicator that flags someone as part of a group. Oxford’s definition extends this to include a particular manner of dress or the use of professional jargon. In the less strict sense, a shibboleth can be used to identify someone of the “in” group just as easily as someone in the “out” group.

While shibboleths work and are a useful idea, it’s not a foolproof concept. That same Wikipedia entry, for instance, also includes a list of US place names that are frequently used as shibboleths to sort locals from non-locals. That sort of knowledge can be learned at a distance these days. Having lived near some of those places, and having done business with people in several of the others, I’ve learned to change my default “outsider” pronunciation of them. I’m aware of a number of others, including the city I currently live in, Waltham, Massachusetts.

Regionalisms can also function as shibboleths—the Pittsburgh “yinz” and Rhode Island’s “what cheer?”—spring to mind. They don’t work in exactly the same way, but can be similarly revealing in how those not familiar with them react. Again, a little bit of knowledge and effort can often go a long way: I attended college in the South (the “shallow” South, not the Deep South) and within a few months had identified about half a dozen subtle differences in pronunciation that gave me away as “a Yankee” to those students who cared about such things. It didn’t take much to camouflage these “tells,” and that knowledge still serves me well in some social situations. To stress: this isn’t exactly the definition of a shibboleth, but I think the similarity of ideas helps demonstrate what we’re talking about.

In contemporary American culture, the far right in general and QAnon in particular are known for many shibboleths; that’s a deep rabbit hole not suitable for discussion here. To be completely fair, though, it’s pretty easy to find shibboleths in groups across whatever spectrum you’re looking at, whether that’s political, class, professional, or some shared interest.

Shibboleth is also the name of an open-source software product which is described as “one of the most widely used identity management systems in the world.” It strikes me as a curious name choice, considering the word origin and modern definition. But using shibboleth to mean “a kind of linguistic password” makes this use more understandable.


The references in the title of this post are to “sibyl,” a Greek-derived term for an oracle or prophetess, and “shoggoth,” a species of dangerous, intelligent amorphous creatures that appear in the story “At the Mountains of Madness.”

Posted in Language, Words | Tagged , , , , , , , , , | Leave a comment