# Andrey Markov & Claude Shannon Counted Letters to Build the First Language-Generation Models

## Shannon’s said: “OCRO HLI RGWR NMIELWIS”

Russian mathematician Andrey Andreyevich Markov in front of his statistical analyses of Alexander Pushkin’s novel, Eugene Onegin.
Photo-illustration: Gluekit

This is part three of a six-part series on the history of natural language processing.

In 1913, the Russian mathematician Andrey Andreyevich Markov sat down in his study in St. Petersburg with a copy of Alexander Pushkin’s 19th century verse novel, Eugene Onegin, a literary classic at the time. Markov, however, did not start reading Pushkin’s famous text. Rather, he took a pen and piece of drafting paper, and wrote out the first 20,000 letters of the book in one long string of letters, eliminating all punctuation and spaces. Then he arranged these letters in 200 grids (10-by-10 characters each) and began counting the vowels in every row and column, tallying the results.

To an onlooker, Markov’s behavior would have appeared bizarre. Why would someone deconstruct a work of literary genius in this way, rendering it incomprehensible? But Markov was not reading the book to learn lessons about life and human nature; he was searching for the text’s more fundamental mathematical structure.

Markov was searching for the text’s fundamental mathematical structure.

In separating the vowels from the consonants, Markov was testing a theory of probability that he had been developing since 1909. Up until that point, the field of probability had been mostly limited to analyzing phenomena like roulette or coin flipping, where the outcome of previous events does not change the probability of current events. But Markov felt that most things happen in chains of causality and are dependent on prior outcomes. He wanted a way of modeling these occurrences through probabilistic analysis.

Language, Markov believed, was an example of a system where past occurrences partly determine present outcomes. To demonstrate this, he wanted to show that in a text like Pushkin’s novel, the chance of a certain letter appearing at some point in the text is dependent, to some extent, on the letter that came before it.

To do so, Markov began counting vowels in Eugene Onegin, and found that 43 percent of letters were vowels and 57 percent were consonants. Then Markov separated the 20,000 letters into pairs of vowels and consonant combinations: He found that there were 1,104 vowel-vowel pairs, 3,827 consonant-consonant pairs, and 15,069 vowel-consonant and consonant-vowel pairs. What this demonstrated, statistically speaking, was that for any given letter in Pushkin’s text, if it was a vowel, odds were that the next letter would be a consonant, and vice versa.

Markov used this analysis to demonstrate that Pushkin’s Eugene Onegin wasn’t just a random distribution of letters but had some underlying statistical qualities that could be modeled. The enigmatic research paper that came out of this study, entitled “An Example of Statistical Investigation of the Text Eugene Onegin Concerning the Connection of Samples in Chains,” was not widely cited in Markov’s lifetime, and not translated to English until 2006. But some of its central concepts around probability and language spread across the globe, eventually finding re-articulation in Claude Shannon’s hugely influential paper, “A Mathematical Theory of Communication,” which came out in 1948.

Shannon’s paper outlined a way to precisely measure the quantity of information in a message, and in doing so, set the foundations for a theory of information that would come to define the digital age. Shannon was fascinated by Markov’s idea that in a given text, the likelihood of some letter or word appearing could be approximated. Like Markov, Shannon demonstrated this by performing some textual experiments that involved making a statistical model of language, then took a step further by trying to use the model to generate text according to those statistical rules.

In an initial control experiment, he started by generating a sentence by picking letters randomly from a 27-symbol alphabet (26 letters, plus a space), and got the following output:

XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD

The sentence was meaningless noise, Shannon said, because when we communicate we don’t choose letters with equal probability. As Markov had shown, consonants are more likely than vowels. But at a greater level of granularity, E’s are more common than S’s which are more common than Q’s. To account for this, Shannon amended his original alphabet so that it modeled the probability of English more closely—he was 11 percent more likely to draw an E from the alphabet than a Q. When he again drew letters at random from this recalibrated corpus he got a sentence that came a bit closer to English.

OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA NAH BRL.

In a series of subsequent experiments, Shannon demonstrated that as you make the statistical model even more complex, you get increasingly more comprehensible results. Shannon, via Markov, revealed a statistical framework for the English language, and showed that by modeling this framework—by analyzing the dependent probabilities of letters and words appearing in combination with each other—he could actually generate language.

“THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED”

The more complex the statistical model of a given text, the more accurate the language generation becomes—or as Shannon put it, the greater “resemblance to ordinary English text.” In the final experiment, Shannon drew from a corpus of words instead of letters and achieved the following:

THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.

For both Shannon and Markov, the insight that language’s statistical properties could be modeled offered a way to re-think broader problems that they were working on.

For Markov, it extended the study of stochasticity beyond mutually independent events, paving the way for a new era in probability theory. For Shannon, it helped him formulate a precise way of measuring and encoding units of information in a message, which revolutionized telecommunications and, eventually, digital communication. But their statistical approach to language modeling and generation also ushered in a new era for natural language processing, which has ramified through the digital age to this day.

This is the third installment of a six-part series on the history of natural language processing. Last week’s post described Leibniz’s proposal for a machine that combined concepts to form reasoned arguments. Come back next Monday for part four, “Why People Demanded Privacy to Confide in the World’s First Chatbot.”

You can also check out our prior series on the untold history of AI.

The Conversation (0)

## Will AI Steal Submarines’ Stealth?

### Better detection will make the oceans transparent—and perhaps doom mutually assured destruction

The Virginia-class fast attack submarine USS Virginia cruises through the Mediterranean in 2010. Back then, it could effectively disappear just by diving.

U.S. Navy

Submarines are valued primarily for their ability to hide. The assurance that submarines would likely survive the first missile strike in a nuclear war and thus be able to respond by launching missiles in a second strike is key to the strategy of deterrence known as mutually assured destruction. Any new technology that might render the oceans effectively transparent, making it trivial to spot lurking submarines, could thus undermine the peace of the world. For nearly a century, naval engineers have striven to develop ever-faster, ever-quieter submarines. But they have worked just as hard at advancing a wide array of radar, sonar, and other technologies designed to detect, target, and eliminate enemy submarines.

The balance seemed to turn with the emergence of nuclear-powered submarines in the early 1960s. In a 2015 study for the Center for Strategic and Budgetary Assessment, Bryan Clark, a naval specialist now at the Hudson Institute, noted that the ability of these boats to remain submerged for long periods of time made them “nearly impossible to find with radar and active sonar.” But even these stealthy submarines produce subtle, very-low-frequency noises that can be picked up from far away by networks of acoustic hydrophone arrays mounted to the seafloor.

And now the game of submarine hide-and-seek may be approaching the point at which submarines can no longer elude detection and simply disappear. It may come as early as 2050, according to a recent study by the National Security College of the Australian National University, in Canberra. This timing is particularly significant because the enormous costs required to design and build a submarine are meant to be spread out over at least 60 years. A submarine that goes into service today should still be in service in 2082. Nuclear-powered submarines, such as the Virginia-class fast-attack submarine, each cost roughly US \$2.8 billion, according to the U.S. Congressional Budget Office. And that’s just the purchase price; the total life cycle cost for the new Columbia-class ballistic-missile submarine is estimated to exceed \$395 billion.

The twin problems of detecting submarines of rival countries and protecting one’s own submarines from detection are enormous, and the technical details are closely guarded secrets. Many naval experts are speculating about sensing technologies that could be used in concert with modern AI methodologies to neutralize a submarine’s stealth. Rose Gottemoeller, former deputy secretary general of NATO, warns that “the stealth of submarines will be difficult to sustain, as sensing of all kinds, in multiple spectra, in and out of the water becomes more ubiquitous.” And the ongoing contest between stealth and detection is becoming increasingly volatile as these new technologies threaten to overturn the balance.

## We have new ways to find submarines

Today’s sensing technologies for detecting submarines are moving beyond merely hearing submarines to pinpointing their position through a variety of non-acoustic techniques. Submarines can now be detected by the tiny amounts of radiation and chemicals they emit, by slight disturbances in the Earth’s magnetic fields, and by reflected light from laser or LED pulses. All these methods seek to detect anomalies in the natural environment, as represented in sophisticated models of baseline conditions that have been developed within the last decade, thanks in part to Moore’s Law advances in computing power.

Airborne laser-based sensors can detect submarines lurking near the surface.IEEE Spectrum

According to experts at the Center for Strategic and International Studies, in Washington, D.C., two methods offer particular promise. Lidar sensors transmit laser pulses through the water to produce highly accurate 3D scans of objects. Magnetic anomaly detection (MAD) instruments monitor the Earth’s magnetic fields and can detect subtle disturbances caused by the metal hull of a submerged submarine.

Both sensors have drawbacks. MAD works only at low altitudes or underwater. It is often not sensitive enough to pick out the disturbances caused by submarines from among the many other subtle shifts in electromagnetic fields under the ocean.

Lidar has better range and resolution and can be installed on satellites, but it consumes a lot of power—a standard automotive unit with a range of several hundred meters can burn 25 watts. Lidar is also prohibitively expensive, especially when operated in space. In 2018, NASA launched a satellite with laser imaging technology to monitor changes in Earth’s surface—notably changes in the patterns on the ocean’s surface; the satellite cost more than \$1 billion.

Indeed, where you place the sensors is crucial. Underwater sensor arrays won’t put an end to submarine stealth by themselves. Retired Rear Adm. John Gower, former submarine commander for the Royal Navy of the United Kingdom, notes that sensors “need to be placed somewhere free from being trolled or fished, free from seismic activity, and close to locations from which they can be monitored and to which they can transmit collected data. That severely limits the options available.”

One way to get around the need for precise placement is to make the sensors mobile. Underwater drone swarms can do just that, which is why some experts have proposed them as the ultimate antisubmarine capability.

Clark, for instance, notes that such drones now have enhanced computing power and batteries that can last for two weeks between charges. The U.S. Navy is working on a drone that could run for 90 days. Drones are also now equipped with the chemical, optical, and geomagnetic sensors mentioned earlier. Networked underwater drones, perhaps working in conjunction with airborne drones, may be useful for not only detecting submarines but also destroying them, which is why several militaries are investing heavily in them.

A U.S. Navy P-8 Poseidon aircraft, equipped to detect submarines, awaits refueling in Okinawa, Japan, in 2020. U.S.Navy

For example, the Chinese Navy has invested in a fishlike undersea drone known as Robo-Shark, which was designed specifically for hunting submarines. Meanwhile, the U.S. Navy is developing the Low-Cost Unmanned Aerial Vehicle Swarming Technology, for conducting surveillance missions. Each Locust drone weighs about 6 kilograms, costs \$15,000, and can be outfitted with MAD sensors; it can skim low over the ocean’s surface to detect signals under the water. Militaries study the drone option because it might work. Then again, it very well might not.

Robo-Shark, a 2.2-meter-long submersible made by Boya Gongdao Robot Technology, of Beijing, is said to be capable of underwater surveillance and unspecified antisubmarine operations. The company says that the robot moves at up to 5 meters per second (10 knots) by using a three-joint structure to wave the caudal fin, making less noise than a standard propeller would. robosea.org

Gower considers underwater drones to be “the least likely innovation to make a difference in the decline of submarine stealth.” A navy would need a lot of drones, data rates are exceedingly slow, and a drone’s transmission range is short. Drones are also noisy and extremely easy to detect. “Not to mention that controlling thousands of underwater drones far exceeds current technological capabilities,” he adds.

Gower says it could be possible “to use drones and sonar networks together in choke points to detect submarine patrols.” Among the strategically important submarine patrol choke points are the exit routes on either side of Ireland, for U.K. submarines; those around the islands of Hainan and Taiwan, for Chinese submarines; in the Barents or Kuril Island chain, for Russian submarines; and the Straits of Juan de Fuca, for U.S. Pacific submarines. On the other hand, he notes, “They could be monitored and removed since they would be close to sovereign territories. As such, the challenges would likely outweigh the gains.”

Gower believes a more powerful means of submarine detection lies in the “persistent coverage of the Earth’s surface by commercial satellites,” which he says “represents the most substantial shift in our detection capabilities compared to the past.” More than 2,800 of these satellites are already in orbit. Governments once dominated space because the cost of building and launching satellites was so great. These days, much cheaper satellite technology is available, and private companies are launching constellations of tens to thousands of satellites that can work together to image every bit of the Earth’s surface. They are outfitted with a wide range of sensing technologies, including synthetic aperture radar (SAR), which scans a scene down below while moving over a great distance, providing results like those you’d get from an extremely long antenna. Since these satellite constellations view the same locations multiple times per day, they can capture small changes in activity.

Experts have known for decades about the possibility of detecting submarines with SAR based on the wake patterns they form as they move through the ocean. To detect such patterns, known as Bernoulli humps and Kelvin wakes, the U.S. Navy has invested in the AN/APS-154 Advanced Airborne Sensor, developed by Raytheon. The aircraft-mounted radar is designed to operate at low altitudes and appears to be equipped with high-resolution SAR and lidar sensors.

Commercial satellites equipped with SAR and other imaging instruments are now reaching resolutions that can compete with those of government satellites and offer access to customers at extremely affordable rates. In other words, there’s lots of relevant, unclassified data available for tracking submarines, and the volume is growing exponentially.

One day this trend will matter. But not just yet.

Jeffrey Lewis, director of the East Asia Nonproliferation Program at the James Martin Center for Nonproliferation Studies, regularly uses satellite imagery in his work to track nuclear developments. But tracking submarines is a different matter. “Even though this is a commercially available technology, we still don’t see submarines in real time today,” Lewis says.

The day when commercial satellite imagery reduces the stealth of submarines may well come, says Gower, but “we’re not there yet. Even if you locate a submarine in real time, 10 minutes later, it’s very hard to find again.”

## Artificial intelligence coordinates other sub-detecting tech

Though these new sensing methods have the potential to make submarines more visible, no one of them can do the job on its own. What might make them work together is the master technology of our time: artificial intelligence.

“When we see today’s potential of ubiquitous sensing capabilities combined with the power of big-data analysis,” Gottemoeller says, “it’s only natural to ask the question: Is it now finally possible?” She began her career in the 1970s, when the U.S. Navy was already worried about Soviet submarine-detection technology.

Submarines can now be detected by the tiny amounts of radiation and chemicals they emit, by slight disturbances in the Earth’s magnetic fields, and by reflected light from laser or LED pulses.

Unlike traditional software, which must be programmed in advance, the machine-learning strategy used here, called deep learning, can find patterns in data without outside help. Just this past year, DeepMind’s AlphaFold program achieved a breakthrough in predicting how amino acids fold into proteins, making it possible for scientists to identify the structure of 98.5 percent of human proteins. Earlier work in games, notably Go and chess, showed that deep learning could outdo the best of the old software techniques, even when running on hardware that was no faster.

For AI to work in submarine detection, several technical challenges must be overcome. The first challenge is to train the algorithm, which involves acquiring massive volumes and varieties of sensor data from persistent satellite coverage of the ocean’s surface as well as regular underwater collection in strategic locations. Using such data, the AI can establish a detailed model of baseline conditions, then feed new data into the model to find subtle anomalies. Such automated sleuthing is what’s likeliest to detect the presence of a submarine anywhere in the ocean and predict locations based on past transit patterns.

The second challenge is collecting, transmitting, and processing the masses of data in real time. That task would require a lot more computing power than we now have, both in fixed and on mobile collection platforms. But even today’s technology can start to put the various pieces of the technical puzzle together.

## Nuclear deterrence depends on the ability of submarines to hide

For some years to come, the vastness of the ocean will continue to protect the stealth of submarines. But the very prospect of greater ocean transparency has implications for global security. Concealed submarines bearing ballistic missiles provide the threat of retaliation against a first nuclear strike. What if that changes?

“We take for granted the degree to which we rely upon having a significant portion of our forces exist in an essentially invulnerable position,” Lewis says. Even if new developments did not reduce submarine stealth by much, the mere perception of such a reduction could undermine strategic stability.

A Northrop Grumman MQ-8C, an uncrewed helicopter, has recently been deployed by the U.S. Navy in the Indo-Pacific area for use in surveillance. In the future, it will also be used for antisubmarine operations. Northrop Grumman

Gottemoeller warns that “any perception that nuclear-armed submarines have become more targetable will lead to questions about the survivability of second-strike forces. Consequently, countries are going to do everything they can to counter any such vulnerability.”

Experts disagree on the irreversibility of ocean transparency. Because any technological breakthroughs will not be implemented overnight, “nations should have ample time to develop countermeasures [that] cancel out any improved detection capabilities,” says Matt Korda, senior research associate at the Federation of American Scientists, in Washington, D.C. However, Roger Bradbury and eight colleagues at the National Security College of the Australian National University disagree, claiming that any technical ability to counter detection technologies will start to decline by 2050.

Korda also points out that ocean transparency, to the extent that it occurs, “will not affect countries equally. And that raises some interesting questions.” For example, U.S. nuclear-powered submarines are “the quietest on the planet. They are virtually undetectable. Even if submarines become more visible in general, this may have zero meaningful effect on U.S. submarines’ survivability.”

Sylvia Mishra, a new-tech nuclear officer at the European Leadership Network, a London-based think tank, says she is “more concerned about the overall problem of ambiguity under the sea.” Until recently, she says, movement under the oceans was the purview of governments. Now, though, there’s a growing industry presence under the sea. For example, companies are laying many underwater fiber-optic communication cables, Mishra says, “which may lead to greater congestion of underwater inspection vehicles, and the possibility for confusion.”

A Snakehead, a large underwater drone designed to be launched and recovered by U.S. Navy nuclear-powered submarines, is shown at its christening ceremony in Narragansett Bay in Newport, R.I.U.S. Navy

Confusion might come from the fact that drones, unlike surface ships, do not bear a country flag, and therefore their ownership may be unclear. This uncertainty, coupled with the possibility that the drones could also carry lethal payloads, increases the risk that a naval force might view an innocuous commercial drone as hostile. “Any actions that hold the strategic assets of adversaries at risk may produce new touch points for conflict and exacerbate the risk of war,” says Mishra.

Given the strategic importance of submarine stealth, Gower asks, “Why would any country want to detect and track submarines? It’s only something you’d do if you want to make a nuclear-armed power nervous.” Even in the Cold War, when the United States and the U.K. routinely tracked Soviet ballistic-missile submarines, they did so only because they knew their activities would go undetected—that is, without risking escalation. Gower postulates that this was dangerously arrogant: “To actively track second-strike nuclear forces is about as escalatory as you might imagine.”

“All nuclear-armed states place a great value on their second-strike forces,” Gottemoeller says. If greater ocean transparency produces new risks to their survivability, real or perceived, she says, countries may respond in two ways: build up their nuclear forces further and take new measures to protect and defend them, producing a new arms race; or else keep the number of nuclear weapons limited and find other ways to bolster their viability.

Ultimately, such considerations have not dampened the enthusiasm of certain governments for acquiring submarines. In September 2021 the Australian government announced an enhanced trilateral partnership with the United States and the United Kingdom. The new deal, known as AUKUS, will provide Australia with up to eight nuclear-powered submarines with the most coveted propulsion technology in the world. However, it could be at least 20 years before the Royal Australian Navy can deploy the first of its new subs.

The Boeing Orca, the largest underwater drone in the U.S. Navy’s inventory, was christened in April, in Huntington Beach, Calif. The craft is designed, among other things, for use in antisubmarine warfare. The Boeing Company

As part of its plans for nuclear modernization, the United States has started replacing its entire fleet of 14 Ohio-class ballistic-missile submarines with new Columbia-class boats. The replacement program is projected to cost more than \$128 billion for acquisition and \$267 billion over their full life cycles. U.S. government officials and experts justify the steep cost of these submarines with their critical role in bolstering nuclear deterrence through their perceived invulnerability.

To protect the stealth of submarines, Mishra says, “There is a need for creative thinking. One possibility is exploring a code of conduct for the employment of emerging technologies for surveillance missions.”

There are precedents for such cooperation. During the Cold War, the United States and the Soviet Union set up a secure communications system—a hotline—to help prevent a misunderstanding from snowballing into a disaster. The two countries also developed a body of rules and procedures, such as never to launch a missile along a potentially threatening trajectory. Nuclear powers could agree to exercise similar restraint in the detection of submarines. The stealthy submarine isn’t gone; it still has years of life left. That gives us ample time to find new ways to keep the peace.