When you think of punctuation, do you think of math?

You probably should, as scientists from the Institute of Nuclear Physics of the Polish Academy of Sciences (IFJ PAN) in Cracow explain in a study reported by Phys.org. The study compared punctuation between seven, mainly Western, languages and concluded that it’s strangely mathematical: the same statistical features of punctuation usage patterns were observed in several hundred works written in all of these languages.

Punctuation, then, turns out to be a universal and indispensable complement to the mathematical perfection of every language studied.

“The present analyses are an extension of our earlier results on the multifractal features of sentence length variation in works of world literature. After all, what is sentence length? It is nothing more than the distance to the next specific punctuation mark— the full stop. So now we have taken all punctuation marks under a statistical magnifying glass, and we have also looked at what happens to punctuation during translation,” says Prof. Stanislaw Drozdz (IFJ PAN, Cracow University of Technology).

The study included such writers as Conrad, Dickens, Doyle, Hemingway, Kipling, Orwell, Salinger, Woolf, Grass, Kafka, Mann, Nietzsche, Goethe, La Fayette, Dumas, Hugo, Proust, Verne, Eco, Cervantes, Sienkiewicz or Reymont… in short, all of the classics.

The Weibull distribution

The attention of the Cracow researchers was primarily drawn to the statistical distribution of the distance between consecutive punctuation marks. It soon became evident that in all the languages studied, it was best described by one of the precisely defined variants of the Weibull distribution:

Punctuation in literature | From the blog of Nicholas C. Rossis, author of science fiction, the Pearseus epic fantasy series and children's books

Image: Phys.org

A curve of this type has a characteristic shape (see above): it grows rapidly at first and then, after reaching a maximum value, descends somewhat more slowly to a certain critical value, below which it reaches zero with small and constantly decreasing dynamics. The Weibull distribution is usually used to describe survival phenomena (e.g. population as a function of age), but also various physical processes, such as increasing fatigue of materials.

“Punctuation thus seems to be an integral part of all the languages studied,” notes Prof. Drozdz. “And since the Weibull distribution is concerned with phenomena such as survival, it can be said with not too much tongue-in-cheek that punctuation has in its nature a literally embedded struggle for survival.”

The hazard function

The next stage of the analyses consisted of determining the so-called hazard function. In the case of punctuation, it describes how likely it is for a specific punctuation mark to appear next if no such mark has yet appeared.

The hazard function curves for punctuation marks in the six languages studied appeared to follow a similar pattern. However, the language where it’s less likely to predict which punctuation mark will appear next is English, with Spanish not far behind; Slavic languages proved to be the most punctuation-dependent.

In other words, English and Spanish, contemporarily the most universal languages, appear to be less strict about the frequency of punctuation use. It is likely that these languages are so formalized in terms of sentence construction that there is less room for ambiguity that would need to be resolved with punctuation marks.

As for German, it proved to be the exception. German punctuation seems to combine the punctuation features of many languages, making it a kind of Esperanto punctuation.

Not lost in translation?

The above observation dovetails with the next analysis, which was to see whether the punctuation features of original literary works can be seen in their translations. As expected, the language most faithfully transforming punctuation from the original language to the target language turned out to be German.

In spoken communication, pauses can be justified by human physiology, such as the need to catch one’s breath or to take a moment to structure what is to be said next in one’s mind. And in written communication?

“Creating a sentence by adding one word after another while ensuring that the message is clear and unambiguous is a bit like tightening the string of a bow: it is easy at first, but becomes more demanding with each passing moment. If there are no ordering elements in the text (and this is the role of punctuation), the difficulty of interpretation increases as the string of words lengthens. A bow that is too tight can break, and a sentence that is too long can become unintelligible. Therefore, the author is faced with the necessity of ‘freeing the arrow’, i.e. closing a passage of text with some sort of punctuation mark. This observation applies to all the languages analyzed, so we are dealing with what could be called a linguistic law,” states Dr. Tomasz Stanisz (IFJ PAN), first author of the article in question.

Finally, it is worth noting that the invention of punctuation is relatively recent—punctuation marks did not occur at all in old texts. The emergence of optimal punctuation patterns in modern written languages can therefore be interpreted as the result of their evolutionary advancement. However, the excessive need for punctuation is not necessarily a sign of such sophistication.

For something this new, it’s surprisingly universal, which makes me wonder how language may evolve next. What kinds of literary wonders await future generations, still unimagined by us?