This is part one of a six-part series on the history of natural language processing.
We’re in the middle of a boom time for natural language processing (NLP), the field of computer science that focuses on linguistic interactions between humans and machines. Thanks to advances in machine learning over the past decade, we’ve seen vast improvements in speech recognition and machine translation software. Language generators are now good enough to write coherent news articles, and virtual agents like Siri and Alexa are becoming part of our daily lives.
Most trace the origins of this field back to the beginning of the computer age, when Alan Turing, writing in 1950, imagined a smart machine that could interact fluently with a human via typed text on a screen. For this reason, machine-generated language is mostly understood as a digital phenomenon—and a central goal of artificial intelligence (AI) research.
This six-part series will challenge that common understanding of NLP. In fact, attempts to design formal rules and machines that can analyze, process, and generate language go back hundreds of years.
While specific technologies have changed over time, the basic idea of treating language as a material that can be artificially manipulated by rule-based systems has been pursued by many people in many cultures and for many different reasons. These historical experiments reveal the promise and perils of attempting to simulate human language in non-human ways—and they hold lessons for today’s practitioners of cutting-edge NLP techniques.
The story begins in medieval Spain. In the late 1200s, a Jewish mystic by the name of Abraham Abulafia sat down at a table in his small house in Barcelona, picked up a quill, dipped it in ink, and began combining the letters of the Hebrew alphabet in strange and seemingly random ways. Aleph with Bet, Bet with Gimmel, Gimmel with Aleph and Bet, and so on.
Abulafia called this practice “the science of the combination of letters.” He wasn’t actually combining letters at random; instead he was carefully following a secret set of rules that he had devised while studying an ancient Kabbalistic text called the Sefer Yetsirah. This book describes how God created “all that is formed and all that is spoken” by combining Hebrew letters according to sacred formulas. In one section, God exhausts all possible two-letter combinations of the 22 Hebrew letters.
By studying the Sefer Yetsirah, Abulafia gained the insight that linguistic symbols can be manipulated with formal rules in order to create new, interesting, insightful sentences. To this end, he spent months generating thousands of combinations of the 22 letters of the Hebrew alphabet and eventually emerged with a series of books that he claimed were endowed with prophetic wisdom.
For Abulafia, generating language according to divine rules offered insight into the sacred and the unknown, or as he put it, allowed him to “grasp things which by human tradition or by thyself thou would not be able to know.”
But other Jewish scholars considered this rudimentary language generation a dangerous act that bordered on the profane. The Talmud tells stories of rabbis who, by the magical act of permuting language according to the formulas set out in the Sefer Yetsirah, created artificial creatures called golems. In these tales, rabbis manipulated the letters of the Hebrew alphabet to replicate God’s act of creation, using the sacred formulas to imbue inanimate objects with life.
In some of these myths, the rabbis used this skill for practical reasons, to make animals to eat when hungry or servants to help them with domestic duties. But many of these golem stories end badly. In one particularly well-known fable, Judah Loew ben Bezalel, the 16th century rabbi of Prague, used the sacred practice of letter combinatorics to conjure a golem to protect the Jewish community from antisemitic attacks, only to see the golem turn violently on him instead.
This “science of the combination of letters” was a rudimentary form of natural language processing, as it involved combining letters of the Hebrew alphabet according to specific rules. For Kabbalists, it was a double-edged sword: a way to access new forms of knowledge and wisdom, but also an inherently dangerous practice that could bring about unintended consequences.
This tension reappears throughout the long history of language processing, and still echoes in discussions about the most cutting-edge NLP technology of our digital era.
This is the first installment of a six-part series on the history of natural language processing. Come back next Monday for part two, which brings us to the Enlightenment, when Gottfried Wilhelm Leibniz dreamed of a machine that could calculate ideas.
You can also check out our prior series on the untold history of AI.