Linguistics for the World-Builder
Constructing a believable, habitable universe is one of the first things a writer of speculative fiction has to tackle — the idea of the world, its shapes and peoples, is a necessary thing to telling any story, but it's especially tricky when you're playing with reality. Both SF and fantasy have to address the problem of suspending disbelief for the reader and immersing them in a world, whether it's similar to their own or radically different. Writers pay attention to their physical setting much of the time, describing sensible objects like terrain, smells, plants, animals, weather. They build a panorama view of their world for the reader.
Problems arise, however, in believably populating this real-seeming place with real-seeming people. What makes a person real, or a society? There are obvious questions to be answered — their belief systems, their biases (sexual, racial, economic, there will be at least a few), their relationships; variety within those among an appropriately varied population. There is also the question of language, the complex phenomena of human communication. An understanding of language and its science, linguistics, is invaluable to a world-builder. It changes one's entire way of looking at how people really speak.
Linguistics is a multi-faceted and theoretically dense science, one worth dedicated study, but for the curious writer there are several immediately useful, fun concepts in socio- and psycholinguistics that can be used to add interest and complexity to fiction. Psycholinguistics deals with the relationship between language and cognition, words and thought, whereas sociolinguistics deals with how people use language in society. A crash course on some linguistic theories might give the writer a whole new box of tools for world-building, with the caveat that great world-building is no replacement for telling a good story; it just helps.
Variation and Codes
Within one language, there will probably exist several codes (a neutral term indicating any kind of system two or more people use for communication). These can include dialect, vernacular, standard language and also socially reinforced codes. Language is not so simple as "English" or "Spanish," or even "Elvish." Speakers who feel that they only have one language will still have multiple codes, though they themselves may not be aware of it.
Writers often employ such distinctions by giving a character an obvious dialect; using dropped syllables or phonetic slurrings like "gonna" or "aintcha." The problem with this is that it's an overly simplistic way of presenting dialect, used to denote the "otherness" of a given character. A better example of dialect as it actually functions in real life is that of Nalo Hopkinson's Brown Girl in the Ring, which pays close attention to the less obvious but more intrinsic aspects of a real dialect: grammatical and lexical changes.
"Is what do you, child?"
"Oh God, Mami, Tony gone, I ain't know where, and I black out again, and when I wake up, it had Crack and Jay and Crapaud lying on the ground, and the rose dead, and Crapaud… Crapaud… Mami, everything go wrong."
Dialect is not just dropping a letter here and there — it's about word-order, pronoun use, and any number of other grammatical differences.
Dialect is also not set in stone: while it may be a character's comfortable/personal speech, they also possess other codes. Those codes could be used in social situations when the speaker is addressing someone of a higher status or more official position than themselves, called respect varieties. They could be religious or ethnic codes used only in the shared speech community; or the "genderlect," a much-argued speech variation that some linguists have claimed to detect when people of one gender are isolated in groups together. So, while your character may speak "southern" dialect in a group of friends, in front of their boss or in a court of law they will assuredly attempt the standard language variety instead of using their comfortable dialect. They may not succeed, but they will try — social pressure sees to that. Language is a social device, and as such, it is nearly impossible to separate it from its social dimensions.
All of these varieties can be grouped also into what linguist Basil Bernstein described as elaborate and restricted codes. The elaborate code is a language variety with a wider range of lexical items (words) available in it, which can communicate complex ideas between speakers unfamiliar with each other. The restricted code is one that functions in the opposite way; it lacks lexical variety and cannot be used to express the same wide range of ideas. The restricted code is often used with tight-knit groups: between partners, friends, in religious groups, in militaries, et cetera. There is also a level of class definition with elaborate and restricted codes: every person speaks a restricted code, but not every person will have the opportunity or available education to learn the elaborate code. Thus, dialect is often a restricted code, while the standard language variety is an elaborate code.
When writing dialogue, codes should be in the forefront of the writer's mind: which characters speak which codes, and when, and with whom? They shouldn't always speak in exactly the same way. If the language is a construction of the writer's own devising or the book is not contemporary, they must either research or devise the codes for themselves. A character who is mono-lingual doesn't only speak one way. Including codes lends dialogue an authenticity that it will otherwise lack. The reader might not be consciously aware of the speech constructions, but they hear them every day in their own language. The reality will seem smoother.
Multi-Lingual Societies
It is not uncommon for a community/society (due to colonialism, shared borders, shared religion, or any number of factors) to be multilingual. However, being multilingual isn't just about speaking two or more languages as mother-tongues. As with other types of codes, the social variations and pressures that accompany multilingual societies are a driving factor in speech and usage.
If the created world demands a multilingual society, there are many possible ways for this to manifest, and each has different implications for the way people speak. Like other codes, which language is spoken is a matter of domain: it is interactive, depending on who the speaker is speaking to and in what situation. The lines are often blurry in these domains, resulting in a multilingual phenomena called code-switching. This is when, in the course of a single conversation or even sentence, the multilingual speaker flips without acknowledgement (often without noticing it themselves) between their languages. An excellent example of this in textual action is Maureen F. McHugh's China Mountain Zhang, where Chinese and English flow into and around each other in the character's speech.
"We each and each respect, dui budui?"
"Dui," I say. Right.
"Here, you tech engineer, job so-so."
"Bu-cuo," I answer. Not bad.
"I have daughter," Foreman Qian says. "Request you to my home come, meet her, hao buhao?"
This is, obviously, a difficult trick to manage in a narrative context where the reader may not understand both of the languages the writer wants to use.
Another option available to the world-builder is the stricter phenomenon of diglossia, which occurs in a stable language environment where, in addition to the usual dialects, there is also a highly codified split between two language-varieties. One is the High variety, used in education, official documents and often supported by a wealth of literature or historical respect. It is not generally a conversational or commonly spoken language. That role is filled by a Low variety, often the language of the working class and split off from the literary and historical traditions of the country. For example, in Switzerland the High variety is Standard German, taught in schools and codified in textbooks (more an elaborate code) and the Low variety is Swiss German, a local language variation that is not fully intelligible to a speaker of only Standard German. This is a complex situation to try and use in a fictional setting, but it does occur in reality.
A third multilingual possibility is that of pidgin or creole language developments that occur in overlapping trade societies or in colonial situations. A pidgin is a restricted code made up of mutually intelligible words and used between speakers of two separate languages who come into common, and sometimes forced, contact. Refugees, victims of colonization, and similar peoples often use a pidgin language. It is not technically a functional language and often contains only lexical items while lacking a working grammar. A creole occurs when the pidgin language begins to deepen and develop through continued exposure, until there are mother-tongue speakers of a hybrid language created from both parent languages. These mixed languages, especially in worlds where perhaps an alien and human civilization are trading, or where one has invaded the other, can be especially illustrative of social developments and character interactions. The pidgin can be used to show language barriers, trouble with communication on a greater scale, while the creole possibility illustrates the combination of two societies into something new.
The Words Themselves
Though the overarching shapes and patterns in a language are important for authenticity, the word-by-word level is equally important. It's commonly known that languages seem to reflect their surroundings but the reasons why are fascinatingly layered.
One classic psycholinguistic theory dealing with that interrelation is the Sapir-Whorf hypothesis. It's been expanded and adapted in recent years to include further evidence and is no longer used in its original form. The basic supposition of the hypothesis is that differences in ways of thinking of things are due to the language itself. Thus thinking of "slush," "snow," "sleet," and "flurries" as either different or the same is a consequence of the words available to us in our language system. However, this does not mean that language restricts thinking; that part of the hypothesis has been roundly discarded. There is also the question of what came first — does a plains-dwelling people develop a language with only one word for hill because there are no mountains, or do they consider all mounds, hills and mountains to be "hills" because of the language they grew up with?
There must have been a point when, as a language split off from its root due to geographical or other changes, words began to either die off or grow anew to reflect the new surroundings. For those generations, as the language slowly changed, the words were changing to reflect the surroundings. Some words weren't needed any longer, and there were some things that had a need for a new word. Kerry Tynan Fraser contributed an article on this phenomenon to the February issue of Clarkesworld, "Neologism and Linguicide." Some of the ideas on language change it posits are found more in pop-science than in current linguistic theories. There are patterns in how new words are created which remain similar in all code-systems. Fraser's article relies on Bill Bryson's theories, but for further reader reference William McGregor's Linguistics: An Introduction offers a more common pattern-system, one which is currently taught to students of linguistics. For a writer building a fresh society, the language-evolution part of the process can be integral. What words grow out of the new situation? Which words fall out of usage or become combined with other words?
Further generations after an initial change language can, theoretically, affect the way the mother-tongue speakers see the world around them. Take time words, for example. In societies without words for linear time or grammatical markers for tenses, the social concept of time is often cyclical. There are no words to explain linear time processing. Often there is no way to tell what came first. Did the language, as it grew from its original root, drop linear time words to reflect the philosophy of the people? Or, alternately, are the people unable to express linear time because the language they learned did not have the words for it to begin with? There's generally no way to tell for real-life languages, but a writer can play with this knowledge in the world-building. Maybe a new human society lives on a space station for generations and begins losing words for physical characteristics of Earth, like weeds or puddles. Or, maybe a person raised in this society has never known those things and has no words for them, so they would not think "that's a weed" if they saw one. They might simply think "plant" or "grass."
One remarkable example of an adaptation of Sapir-Whorf comes from later work on "thinking for speaking" by Stephen Levinson and John Haviland, which examines the use of spatial terms. In their study, it was found that for speakers of a language that used compass-point directions (north, south, east, west) versus speakers of a language that use body-oriented direction (left, right), their thought processes regarding spatial relations differed. If asked to pick up items off one table, turn around, and reorient them on a table that was previously behind them, they would organize the items differently. The compass-direction speakers oriented their items according to north-south-west-east. The body-oriented direction speakers flipped the objects to mirror the way they were facing on the other table, putting the same objects to their personal right and left — which is different from the compass directions. This isn't a conscious action on their part. It shows that the way they learned language, and what words their language offers them, actually affected their perception. The possibilities for using this in fiction are fantastical. In a space-faring society, how would a person orient their directions? Likely by the way their language did so — but how will the writer choose to arrange that language? It's all world-building, and with each word the writer chooses they can change the whole cognitive arrangement of their character, if they choose to apply some linguistic science in their planning.
The authenticity of any world a writer might create derives from the language they create it with. Books are made of words, after all; they rely on language uniquely. They offer no body language to assist the reader. If the words are right, and the patterns are universal, recognizable even subconsciously, then the writer will succeed in convincing the reader. Attention to language is the only way to do it effectively.
For each of these theories that intrigues, be assured that there are ten more equally engaging that aren't represented here — for example, the much-argued idea of language "death." Another topic that might be of value to the curious is language acquisition — how people learn language — which has both biological and social aspects. It is a fascinating topic that strikes at the heart of what allows us to communicate at all, and examines the consequences of isolation, abuse and childhood development in relation to language.
Linguistics is the science of language, much like story-telling is the art of language. They go hand in hand.
Bibliography
Bernstein, Basil. "Elaborate and Restricted Codes: Their Social Origins and Some Consequences." American Anthropologist 66 (1964): 55-69.
Hopkinson, Nalo. Brown Girl in the Ring. New York: Aspect, 1998.
McGregor, William. Linguistics: An Introduction. New York: Continuum, 2009.
McHugh, Maureen F. China Mountain Zhang. New York: Tor, 1992.
Wardhaugh, Ronald. An Introduction to Sociolinguistics, 4th edition. New York: Wiley-Blackwell, 2002.
Please Support This Month's Sponsors
ABOUT THE AUTHOR

Brit Mandelo is a writer and occasional critic. Her primary fields of interest are speculative fiction and queer literature, especially when the two coincide. Also, comics. Her fiction has recently appeared at Tor.com ("Though Smoke Shall Hide the Sun"). She is a Louisville native and lives there with her partner in an apartment that doesn't have room for all the books.
ISSN 1937-7843 Clarkesworld Magazine © 2013 Wyrm Publishing. Robot illustration by Serj Iulian.
C12VT wrote on April 8th, 2011 at 9:05 am:
Great article! Worldbuilding languages is fun.
One thing I like to think about is what expressions someone from a different world would use. Many idioms in English wouldn't make sense in other settings; e.g. someone growing up on a spaceship might not use phrases like "growing like a weed" or "look a gift horse in the mouth" if these are objects they aren't familiar with; they might not describe someone as having a sunny disposition, or mooning over a lover, if they don't have the same relationship to these astronomical objects as we do. But they would probably have their own expressions that served the same purpose, yet reflected their environment.
On the other hand, people in real life use expressions like "don't look a gift horse in the mouth" even if they've never been close to a horse and don't understand the origin of the phrase. We still dial a phone even if it doesn't have a rotary dial.
Clyde Griffiths wrote on April 8th, 2011 at 10:24 am:
Interesting article, with a lot of thought-provoking references. The implications of the Levinson and Haviland experiment are fascinating. It would be interesting, too, to know what effect different sets of temporal words have on our thought patterns. There are Native American languages that have no separate words for 'today,' 'yesterday,' and 'tomorrow,' for example; and Chinese doesn't have a past or future tense, at all.
@C12VT We also have lots of rapidly aging visual idioms: the 'save' icon on many computer applications is actually an image of a 3.5" floppy (which are, of course, extinct); and many telephony applications use an image of an old handset for theirs. It makes me think we're headed towards the world of Neal Stephenson's The Diamond Age, where even people who were illiterate could at least read 'mediaglyphs,' which were essentially the standardized icons for things like 'save,' 'open,' and 'close.'
One useful definition I learned from this article is 'code-switching.' This is something I've run into before, and it kind of drives me crazy when I see it. And not just in science fiction: Hemingway did this a lot in For Whom the Bell Tolls, with lines like, '"Me voy," said Robert Jordan. "I go."' and '"No es bueno," said Pedro. "It's no good."' It got kind of redundant after 400 pages, and then the characters aren't actually saying everything twice, are they.
The quote Ms. Mandelo offers from China Mountain Zhang goes even further since it seems like it's transliterating Chinese grammar into English. I don't enjoy this since it sounds choppy and unrefined ('Request you to my home come'), but the native Chinese doesn't actually sound like that, and a translator would never produce that output for you. Plus, even if the reader understands Chinese, many of the Chinese people I know have said they can't even read Latinized Chinese--even with the proper tonal markings--because there's just too much ambiguity. I guess I prefer dialectal world building, since I've seen several good examples of that.
Diana Parparita wrote on November 17th, 2011 at 10:03 am:
If I may point out something about the words of a language affecting perception, I think the example about languages that use compass-point directions versus body-oriented directions is not relevant. The choice of words for directions comes from mentality, not language. I'll give you an example to illustrate this. I'm Romanian, and while my mother tongue does contain words for north, south, east and west, directions are given using the body-oriented system. One day, my English teacher, who is a native speaker of English and comes from the UK, noticed that Romanians aren't really aware of which side of a town is the east side and which the west side. We were equally surprised to hear that an Englishman always took the compass into consideration when referring to a side of town. We were surprised that anyone would care about east and west. The west side of a Romanian town isn't all that different from the east side, and when there are differences, they are never linked to the geographic position of the area. He then explained that traditionally, in England, one of these sides (I forget which) is less polluted than the other, because the wind would be blowing the smoke and pollution away over the other side. Hence that side of town is always more fashionable. Wind blows from all directions in Romania, and industrial smoke tends to spread around evenly, which is what makes the distinction between east and west irrelevant. That's why Romanians always give directions in the more immediate system of body-oriented words. It's also why we would never think of arranging objects in relation to the north, for instance, as relevant. I think that if a Romanian moved to the UK, where the east side is, as far as I've heard, noticeably different from the west side, that Romanian would, in time, become aware of the difference between east and west, and would reconsider not just the way in which they arrange objects, but also the way in which they give directions.
The reason why a study on the system a language uses for directions would never be relevant to the way a language affects perception is that, regardless of the choice of words for directions, all languages have words for east, west, north, south, and for right, left, in front of and behind. At least as far as I know. All these notions exist in both types of languages: those that use a compass-point directions system, and those that use a body-oriented system. What makes the languages different is that speakers choose which set of notions is more relevant to them when giving directions. And, as shown in my example, that has to do with the environment and lifestyle of the people. Personally, even though I am Romanian and I will always give directions such as "go right" or "go left" when telling you how to get to the nearest supermarket, I do say "go north" or "go south" when asked for directions to get to a town I've never been to, one I've only seen on the map. That is because then, in relation to a map, the cardinal-point system is the one that's most relevant. It is my perception of what's relevant and what isn't that affects my choice of words, and not the other way around.