Zipf Drive

I've long been fascinated with the omnipresence of power-law statistics in natural and social phenomena. A good example is Zipf's Law for the usage of English words, named for the 20th-century linguist George Kingsley Zipf. The most common word, the , is used twice as often as the second most popular word ( of ) and three times as often as the third ( and ). Similarly, the n th most popular word has a relative frequency of use of 1/ n .

Thus, the curve of popularity versus rank shows a steep decline at first, followed by a long tail that looks rather flat when plotted on a linear scale. (On a log-log plot, of course, this becomes a straight line.) A word like omnipresence is way out on the tail, at popularity position 74 228, right before the word Borodin (the Russian composer), according to WordCount ( https://wordcount.org).

All of the most common words are short, resulting in a very efficient transmission of information. I imagine our distant ancestors sitting around the fire, drawing information-theory equations with sticks in the mud to come up with an optimally parsimonious language, after which they would decide that they shouldn't have used the word parsimonious (popularity number 49 309) when something like concise would have sufficed.

All this is to say that our vocabulary is rather a perfect blend--100 or so popular words used in everyday conversation and writing, together with about 100 000 more esoteric words that get sprinkled in for effect or special purpose.

Many other phenomena exhibit power-law (that is, polynomial) statistics--cities ranked by population, individuals by wealth, earthquakes by strength, Web sites by number of hits, books by online sales. I would even imagine that it applies to something like the distribution of knowledge in electrical engineering. All of us know Ohm's Law, for example, but perhaps only a tenth of us are familiar with the basic concepts in communications. Then maybe only one engineer in 1000 is familiar with a particular protocol, and only one in 100 000 might be conversant with a particular paper in a specific IEEE Transactions . But this is what makes the world go round; we have a lot of things in common, but there is a long tail of specialties that makes each individual unique.

Although power-law statistics have been long known, the subject has gotten much recent attention under the name ”the long tail,” a phrase coined by Chris Anderson, the editor in chief of Wired magazine, in an article in 2004. Discussions have been prompted by the difference between sales in the physical world, where inventories are limited to the popular items, and those in the virtual world of the Internet, where there is no inventory constraint to eliminate all the rare items on the long tail. In the virtual world, the many small sales out on the long tail approximately equal the sales of the few most popular items.

In most cases there are fundamental reasons that statistics behave like a power law. For example, even though it might seem as if individual choices should be uniformly distributed among alternatives, an individual's choice is often influenced by the choices of others. This explains our herdlike behavior, with a flocking around popular choices and a long tail of individual dissent.

How could it be otherwise? Suppose for a moment that power-law statistics weren't the norm and that choices were uniformly distributed. What would the world be like? With all 100 000 or so words equally likely, books would be long and turgid but of little interest, because there would be so few subjects of common concern. And of course it would be almost impossible to learn a foreign language.

Population would be uniformly scattered about the Earth. There would be no cities, and whole countries would be like New Jersey, where I have to describe my home's location by the nearest exit number on the Garden State Parkway. For better or for worse, wealth would be uniformly distributed, and perhaps neither cathedrals nor slums would be so prevalent.

I'm sure that you can provide your own suppositions, but perhaps we could all agree that we wouldn't want to inhabit such a world. Our ancient ancestors around the fire figured this out a long time ago.

About the Author

ROBERT W. LUCKY considers how power-law statistics apply to language in this month's Reflections column. Lucky, an IEEE Fellow, now retired, was vice president for applied research at Telcordia Technologies in Red Bank, N.J.

long tail zipf's law networks standards chris anderson wireless

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Zipf Drive

About the Author

Related Stories

Forget Cryptocurrencies and NFTs—Securing Devices Is the Future of Blockchain Technology

Why the Way We Calculate TV Energy Efficiency is Wrong

5G Just Got Weird

This article is for IEEE members only. Join IEEE to access our full archive.

Membership includes:

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Zipf Drive

About the Author

Related Stories

Forget Cryptocurrencies and NFTs—Securing Devices Is the Future of Blockchain Technology

Why the Way We Calculate TV Energy Efficiency is Wrong

5G Just Got Weird

This article is for IEEE members only. Join IEEE to access our full archive.

Membership includes: