Last Friday, I was in a van in Denver, Colorado with Zooko Wilcox the CEO of ZCash, a company that on 28 October will launch a new blockchain-based digital currency of the same name. On the floor next to me was a bunch of newly purchased computer equipment. I knew we were going to a hotel, but didn’t know where. I only knew that I’d be there for the next two days straight and that it would be my job to watch, ask questions, stave off sleep, and document as much as I possibly could.
That day began a cryptographic ceremony of sorts, one that will make or break a new digital currency. ZCash is identical to Bitcoin in a lot of ways. It’s founded on a digital ledger of transactions called a blockchain that exists on an army of computers that can be anywhere in the world. But it differs from Bitcoin in one critical way: It will be completely anonymous. Although privacy was a motivating factor for Bitcoin’s flock of early adopters, it doesn’t deliver the goods. For those who want to digitally replicate the experience of slipping on a ski mask and handing over an envelope of unmarked bills, ZCash is the new way to go.
On Friday, a series of distributed denial-of-service attacks hit Dyn, a company that provides a form of traffic control for popular websites, and interrupted some users’ access to sites including Github, Twitter, and Netflix. Since then, it has become clear that these attacks were made possible by security vulnerabilities in millions of devices within the Internet of Things.
On Monday at the National Cyber Security Alliance’s Cybersecurity Summit in New York City, industry leaders from security firms, Internet service providers, and device manufacturers fretted over the implications. Panelists spoke about the existential dangers that companies in the fast-growing IoT sector face if they continue to fail to secure these devices and debated ways in which the industry can improve security within this ecosystem.
“Friday showed us that the genie is well out of the bottle at this point,” said Andrew Lee, CEO at security company ESET North America. “This should probably be the wake-up call to manufacturers to start taking this seriously.”
While it’s still not clear who executed Friday’s attacks, Dyn has announced that hackers orchestrated it across “tens of millions” of IP addresses gathered through Mirai, malware that scans the Internet for connected devices with weak security. The malware then enlists these devices into a massive global network called a botnet. Increasingly, hackers have used these networks to launch distributed denial-of-service attacks, in which they instruct many devices to send traffic to a target at once in order to overload its capacity and prevent real users from accessing a website or service.
On Friday, multiple distributed denial-of-service (DDoS) attacks hit the Internet services company Dyn. The cyberattack prevented many users on the U.S. East Coast from navigating to the most popular websites of Dyn customers, which include Twitter, Reddit, and Netflix.
Dyn detected the first attack at 7:10 a.m. Eastern time on Friday and restored normal service about two hours later. Then at 11:52 a.m. ET, Dyn began investigating a second attack. By 2:00 p.m., the company said it was still working to resolve “several attacks” at once.
The interruptions inconvenienced many Internet users, and the daily operation of Internet giants in entertainment, e-commerce, and social media. There still aren’t many details available about Dyn’s predicament, and the company did not immediately respond to an interview request. But we do know from Dyn’s posts that the first two assaults on its network were DDoS attacks. Its customers’ outages again show that major Internet companies remain vulnerable to this common hacker scheme—one that has plagued networks since 2000.
A denial-of-service attack aims to slow or stop users from accessing content or services by impeding the ability of a network or server to respond to their requests. The word “distributed” means that hackers executed the Dyn attacks by infecting and controlling a large network of computers called a botnet, rather than running it from a single machine that they own.
Hackers can assemble a botnet by spreading malware, which is often done by prompting unsuspecting users to click a link or download a file. That malware can be programmed to periodically check with a host computer owned by hackers for further instructions. To launch an attack, the hackers, or bot-herders, send a message through this “command and control” channel, prompting infected computers to send many requests for a particular website, server, or service all at once. Some of the biggest botnets in history have boasted 2 million computers, capable of sending up to 74 billion spam emails a day.
The sudden onslaught of requests quickly gobbles up all the network's bandwidth, disk space, or processing power. That means real users can’t get their requests through because the system is too busy trying to respond to all the bots. In the worst cases, a DDoS can crash a system, taking it completely offline.
Both of Friday’s attacks targeted Dyn’s Managed Domain Name System. Through this system, Dyn provides a routing service that translates Web addresses that users type into a browser, such as spectrum.ieee.org. Users who type in a Web address are first sent through a Dyn server that looks up the IP address for a server that hosts the content the user is trying to reach. The Dyn server passes this information on to the user's browser.
To disrupt this process, says Sanjay Goel, a professor of information technology at the State University of New York (SUNY) at Albany, the bot-herders probably sent tons of translation requests directly to Dyn’s servers by looking up the servers’ IP addresses. They could have also simply asked the bots to send requests for Amazon.com and Twitter.com to cause similar issues. Attacking a DNS or a content delivery provider such as Dyn or Akamai in this manner gives hackers the ability to interrupt many more companies than they could by directly attacking corporate servers, because several companies share Dyn's network.
In Dyn’s case, it has built its Managed DNS on an architecture called Anycast, in which any particular IP address for a server in its system can actually be routed through servers in more than a dozen data centers. So, if the IP address of one server is targeted, 10 others may still be able to handle the normal traffic while it's beseiged with bot requests. Art Manion, a technical manager at Carnegie Mellon University’s Software Engineering Institute, says this system should make Dyn more resilient to DDoS attacks, and the company has touted it as highly secure.
Dyn said on Friday in an update to its website that the first attack mainly impacted services in the “US East.” The Anycast network includes data centers in Washington, D.C., Miami, and Newark, N.J., as well as in Dallas and Chicago, though it’s not clear whether these locations were specifically targeted.
Even in the affected region, only certain users experienced issues. One reason could be that other users' browsers had previously used Dyn to locate the specific server they needed to recover, say, Twitter.com. Because that information is now cached in their browsers, those users can bypass Dyn to fetch the desired content, so long as the servers that store Twitter’s website are still functioning.
Another reason for the inconsistent impacts could be that a common mechanism for handling DDoS attacks is to simply drop every fifth request from the queue in order to relieve the network of traffic. The result: Some requests from legitimate users wind up being dropped along with those from bots.
Once an attack begins, companies can bring backup servers online to manage the blizzard of requests. Victims can also work with Internet service providers to block the IP addresses of devices generating the most traffic, which means that they're likely part of the botnet. "You start blocking the different addresses where it's coming from, so depending on how massive the botnet is, it may take some time," says SUNY Albany's Goel.
Even with state-of-the-art protections and mitigation strategies, companies are limited by the amount of bandwidth they have to handle such sudden onslaughts. “Ultimately, Akamai has total x amount of bandwidth, and if the attacker is sending x-plus-10 traffic, the attacker still wins,” says Carnegie Mellon's Manion. “It mathematically favors whoever has more bandwidth or more traffic, and the attackers today can have more traffic.”
Dyn’s global network manages over 500 billion queries a month, so the culprits would have had to send many millions or even billions of requests simultaneously in order to stall it. Manion says that to prevent DDoS attacks, companies must address root causes such as poor IoT security, rather than scrambling to stop them once they’ve begun.
Modern computers still lack the capability to find the best solution for the classic “traveling salesman” problem. Even finding approximate solutions is challenging. But finding the shortest traveling salesman route among many different cities is more than just an academic exercise. This class of problems lies at the heart of many real-world business challenges such as scheduling delivery truck routes or discovering new pharmaceutical drugs.
As Fei-Fei Li sees it, this is a historical moment for civilization fueled by an artificial intelligence revolution. “I call everything leading up to the second decade of the twenty-first century AI in-vitro,” the Stanford computer science professor told the audience at last week’s White House Frontiers Conference. Heretofore, the technology was being fundamentally understood, formulated, and tested in labs. “At this point we’re going AI in-vivo,” she said. “AI is going to be deployed in society on every aspect of industrial and personal needs.”
It’s already around us in the form of Google searches, voice-recognition, and autonomous vehicles. Which makes this a critical time to talk about diversity.
The lack of diversity in AI is representative of the state of computer science and the tech industry in general. In the United States, for example, women and ethnic minorities such as African-Americans and Latinos are especially underrepresented. Just 18 percent of computer science grads today are women, down from a peak of 37 percent in 1984, according to The American Association of University Women. The problem is worse in AI. At the Recode conference this summer, Margaret Mitchell, the only female researcher in Microsoft’s cognition group, called it “a sea of dudes.”
But the need for diversity in AI is more than just a moral issue. There are three reasons why we should think deeply about increasing diversity in AI, Stanford’s Li says.
The first is simply practical economics. The current technical labor force is not large enough to handle the work that needs to be done in the fields of computing and AI. There isn’t much in the way of specific numbers on diversity in AI, but anecdotal evidence say they’d probably be dismal. Take, for instance, Stanford’s computer science department. AI has the smallest percentage of women undergrads, at least as compared to tracks like graphics or human-computer interaction, Li points out. Worldwide, the GDP from automation and machine learning is expected to rise. So it’s really important that more people study AI, and that they come from diverse backgrounds. “No matter what data we look at today, whether it’s from universities or companies, we lack diversity,” she says.
Another reason diversity should be emphasized is its impact on innovation and creativity. Research repeatedly shows that when people work in diverse groups, they come up with more ingenuous solutions. AI will impact many of our most critical problems, from urban sustainability and energy to healthcare and the needs of aging populations. “We need a diverse group of people to think about this,” she says.
Last, but certainly not the least, is justice and fairness. To teach computers how to identify images or recognize voices, you need massive data sets. Those data sets are made by computer scientists. And if you only have seas of (mostly white) dudes making those data sets, biases and unfairness inadvertently creep in. “Just type the word grandma in your favorite search engine and you’ll see the bias in pictures returned,” Li says. “You’ll see the race bias. If we’re not aware of the bias of data, we’re going to start creating really problematic issues.”
What can we do about this? Bring a humanistic mission statement to the field of AI, Li says. “AI is fundamentally an applied technology that’s going to serve our society,” she says. “Humanistic AI not only raises the awareness of the importance of the technology, it’s a really important way to attract diverse students, technologists and innovators to participate.”
For centuries, technological innovation has created jobs and improved standards of living. Artificial intelligence might change that. For starters, AI-driven automation is not going to treat workers equally. A recent White House called Preparing for the Future of Artificial Intelligence acknowledges that AI could make low- and medium-skill jobs unnecessary, and widen the wage gap between lower- and higher-educated workers.
The good news is that policymakers and technology experts are thinking about this, and instituting plans aimed at avoiding the “Robots are going to take all of our jobs!” doomsday scenario. Academics and industry practitioners discussed AI’s job impact at the White House Frontiers Conference last week. And they were confident and optimistic about our ability to adapt.
“The best solutions are always going to come from minds and machines working together,” said Andrew McAfee, co-director of the MIT Initiative on the Digital Economy, and author of “The Second Machine Age.” But that balance of minds and machines won’t always be the same. In five years, that balance will be totally different in, say, customer service and driving.
The good news is that the U.S. economy is really good at creating new jobs once old ones get automated. As an example, McAfee pointed out that the year of peak manufacturing employment in the United States was 1979. Every year since, the number of people working in the industry has gone down even though output goes up. “Those people didn’t become unemployed and their families didn’t starve,” he said.
Late last month, Amazon, Facebook, Google, IBM, and Microsoft announced that they will create a non-profit organization called Partnership on Artificial Intelligence. At the White House Frontiers Conference held at Carnegie Mellon University today, thought leaders from these companies explained why AI has finally arrived and what challenges lie ahead. (Also read the White House’s report on the future of AI released yesterday.)
While AI research has been going on for more than 60 years, the technology is now at an inflection point, the panelists agreed. That has happened because of three things: faster, more powerful computers; critical computer science advances, mainly statistical machine learning and deep learning techniques; and the massive information available due to sensors and the Internet of Things.
Google’s DeepMind artificial intelligence lab does more than just develop computer programs capable of beating the world’s best human players in the ancient game of Go. The DeepMind unit has also been working on the next generation of deep learning software that combines the ability to recognize data patterns with the memory required to decipher more complex relationships within the data.
This year’s Nobel Prize in Physics has been awarded to three physicists, “for theoretical discoveries of topological phase transitions and topological phases of matter.” Two of the scientists uncovered why the spins of atoms inside particular kinds of magnets form messy patterns at low temperatures. This theoretical work, performed in the 1970s, is still leading engineers to develop better and more efficient superconductors.
Michael Kosterlitz, now at Brown University, and David J. Thouless, now at the University of Washington, modeled 2-D layers of ferromagnets—the kind of magnets that stick to the fridge—at low temperature. Their thought experiments indicated that the atomic spins were not fully aligning over a long distance. In other words, the spins did not come together to form one big bar magnet.
They used the concept of vortices—pockets of atoms inside magnets whose spins are oriented in a way that makes the pocket resemble the eye of a hurricane—to explain the effect. These vortices change the spins of nearby atoms.
The Nobel Prize winners “were really the first to use vortices to explain something that’s very profound in condensed matter physics,” says Michael Lawler, a theoretical physicist at Binghamton University in New York who studies magnetism and superconductivity.
At a press conference Tuesday, Kosterlitz said of his Nobel work: “There aren’t real practical applications and it’s not going to lead to any fancy new devices” because most devices are not two-dimensional.
Yet Lawler says that after the discovery, physicists started looking at other special materials where organization becomes disrupted. In particular, they looked at superconductors—materials that don’t resist the flow of electricity and allow large currents to pass on a relatively small wire.
Promising high-temperature superconductors are made of layers of 2-D material, he says. Inside superconductors, vortices take the form of whirlpools of electrons and have a disorder-inducing effect.
Understanding the vortices mechanism is useful, Lawler says, in part because it helps researchers figure out how they introduce resistance in a superconductor.
Removing the vortices allows engineers to optimize superconductors’ performance, he says, so cables could someday deliver more power to more people. As an example, research in 2008 revealed that tightly coupling the layers of high-temperature superconducting material generates 3-D vortices, which don’t move around as much as 2-D vortices. The result: They don’t introduce as much resistance.
Besides Kosterlitz and Thouless, who also studied conductance with electrically conducting layers, Duncan Haldane was recognized for his studies of small chains of magnets. The prize was awarded to the researchers for their use of topology: mathematics that describes global relationships that stay the same when local relationships between elements change.
For those of us who make a living solving problems, the current deluge of big data might seem like a wonderland. Data scientists and programmers can now draw on reams of human data—and apply them—in ways that would have been unthinkable only a decade ago.
But amid all the excitement, we’re beginning to see hints that our nice, tidy algorithms and predictive models might be prone to the same shortcomings that the humans who create them are. Take, for example, the revelation that Google disproportionately served ads for high-paying jobs to men rather than women. And there’s the troubling recent discovery that a criminal risk assessment score disproportionately flagged many African Americans as higher risk, sometimes resulting in longer prison sentences.
Mathematician and data scientist Cathy O’Neil has a name for these wide-reaching and discriminatory models: Weapons of Math Destruction. In her new book by the same name, she details the ways that algorithms often perpetuate or even worsen inequality and injustice.
We spoke to O’Neil last week during a Facebook Live session to find out how programmers and data scientists can ensure that their models do more good than harm.
Here are a few key takeaways:
1. Recognize the Signs of a “WMD”
A signature of a Weapon of Math Destruction is that it’s used to determine some critical element in the lives of many people. We’re already using algorithms to sort resumes for job openings, automatically schedule shifts for service industry workers, decide the price of insurance or interest rates on a loan, or even to help determine how long a person will spend in jail when convicted of a crime. Because these algorithms affect crucial outcomes for millions of people, they have the potential to do widespread damage.
They’re Secret or Unaccountable
The people most affected by WMD’s often don’t understand the rubric by which they’re being scored, or even that they’re being scored in the first place. The methodology behind them is often a “trade secret,” protecting it from public scrutiny. While many companies argue that this keeps people from learning the rules and gaming the system, the lack of transparency also means there’s no way to check whether the score is actually fair. Machine learning algorithms take this one step further; while they’re powerful tools for finding correlations, they’re also often black boxes, even to the people who create them.
Weapons of Math Destruction have a way of creating their own reality and then using that reality to justify their model, says O’Neil. An algorithm that, say, targets financially vulnerable people for predatory loans creates a feedback loop, making it even harder for them to get out of debt. Similarly, a model that labels a first-time drug offender as higher-risk because he comes from a high-crime neighborhood potentially makes that problem even worse. If his high risk score results in a longer jail sentence, he’ll have fewer connections to his community and fewer job prospects once he’s released. His score becomes a self-fulfilling prophecy, actually putting him at a greater risk of reoffending.
2. Realize There Is No Such Thing as an “Objective Algorithm”
One of the things that makes big data so attractive is the assumption that it’s eliminating human subjectivity and bias. After all, you’re basing everything on hard numbers from the real world, right? Wrong. Predictive models and algorithms, says O’Neil, are really just “opinions embedded in math.” Algorithms are written by human beings with an agenda. The very act of defining what a successful algorithm looks like is a value judgement; and what counts as success for the builders of the algorithm (frequently profit, savings, or efficiency) is not always good for society at large. Because of this, O’Neil says, it’s important for data scientists to look at the bigger picture. Who are the winners in my algorithm—and even more importantly—what happens to the losers?
3. Pay Attention to the Data You’re Using
There’s another reason that algorithms aren’t as trustworthy as we might think: The data they draw on often comes from a world that’s deeply prejudiced and unequal. Crime statistics might seem objective—that is, until you realize that, for example, the mechanisms of the U.S. criminal justice system have been applied unfairly to target minorities throughout its entire history. That bias shows up in crime data. Researchers know that black and white people use marijuana at almost identical rates, but black teenagers are much more likely to be arrested for marijuana possession. The disparity in the numbers has much more to do with systemic racial profiling and a ramped up police presence in historically black neighborhoods than it does with actual levels of criminality.
We’ve made the decision as a society to stamp out discrimination based on race, gender, sexual orientation, or disability status—and fortunately, most data scientists know to be very careful when using these attributes to categorize people or model behavior. But data from the real world is often fraught with less-obvious proxy variables that are essentially stand-ins for those characteristics. Zip codes, for example, are an easy proxy for race, thanks to decades of the discriminatory housing practice called redlining.
4. Get Honest About What You’re Really Modeling
Human behavior is messy, which often means that direct measurements of the attributes we’re trying to model (like criminality, trustworthiness, or fitness for a job) don’t actually exist. Because of this, data scientists often rely on other variables they believe might correlate with what they’re trying to measure.
Car insurance companies, for example, use credit scores as a way to determine how reliable a driver is. At first glance it sounds reasonable to assume that a person who regularly pays her bills on time might be more conscientious or responsible. But strangely, Consumer Reports recently discovered that people with low credit scores and clean driving records were being charged much more for car insurance that people with high credit scores and DUIs on their driving records.
This, of course, is nonsense. Having a previous DUI is a much better indicator of a driver’s likelihood of getting into an accident. But O’Neil asserts that there might be a hidden reason the insurance companies continue to incorporate credit score into their models: it’s a direct measurement of financial vulnerability. Drivers with low credit scores don’t have as much leverage to shop around for lower rates, and a person who’s desperate for insurance is often willing to pay much more to get it.
5. Examine and Systematically Test Your Assumptions
Even well-intentioned algorithms can have flawed assumptions built in. For example, the recidivism risk score mentioned earlier is an attempt to make communities safer by locking up potentially violent repeat offenders and releasing those who are deemed a lower risk. Other intended benefits would be reducing the prison population and making the justice system more fair. But once we lock people away, says O’Neil, we treat prisons as a black box and stop asking questions.
Online giants like Amazon.com take the opposite approach; learning and experimentation are built into their business model. Amazon has a dedicated data laboratory where researchers constantly reexamine every aspect of the consumer experience, finding places along the pipeline where customers get confused or frustrated, or can’t find what they need. This feedback allows Amazon to continuously learn and tweak its online environment to maximize profit.
If we truly wanted to optimize our criminal justice system for community safety, says O’Neil, we’d continuously be running controlled experiments: Does putting someone behind bars with other criminals make them more or less likely to commit a crime upon release? How beneficial are general-equivalency (alternative high school) diploma programs? What is the effect of solitary confinement? Of sexual abuse? How much does it cost to treat someone for a mental disorder, versus repeatedly locking him away?
6. Take The Modelers’ Hippocratic Oath:
Eventually we’ll need laws and industry standards that can keep pace with this technology and require a level of transparency from companies about how they’re using data. It might even require mandatory fairness audits of important algorithms. But in the meantime, a disproportionate amount of the responsibility falls to programmers. Awareness of the issue is a crucial first step. A good way to start is by taking this pledge, originally written by Emanuel Derman and Paul Wilmott in the wake of the 2008 financial crisis:
∼ I will remember that I didn’t make the world, and it doesn’t satisfy my equations.
∼ Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.
∼ I will never sacrifice reality for elegance without explaining why I have done so.
∼ Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights.
∼ I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension.
IEEE Spectrum’s general technology blog, featuring news, analysis, and opinions about engineering, consumer electronics, and technology and society, from the editorial staff and freelance contributors.