# Big Data vs. Chip Defects

Even the top chipmakers, with the most advanced manufacturing equipment available, have some “dog tools”–machines that produce wafers with too many defects. A team of statistics experts in Taiwan, jointly formed by the Academia Sinica and National University of Kaohsiung, has come up with a big data solution that lets Taiwan Semiconductor Manufacturing Company (TSMC) discover those dogs.

The statisticians analyzed TSMC’s raw data on the quality of wafers during trial production runs in 2012 and 2013. When TSMC adopted the resulting statistical model the company lowered the inferior wafer rate by 11 to 14 percent, according to Shu-Hui Yu, associate professor of the Institute of Statistics at National University of Kaohsiung.

Yu says that the team’s ability to identify machines causing defects is better than what TSMC’s internal team [pdf] devoted to finding dog tools came up with. That team’s results only lowered the inferior wafer rate by 3 percent. (Under pressure from TSMC, Yu recently sent a public note of apology to TSMC for disclosing those figures.)

According to team leader Ching-Kang Ing, a research fellow of the Institute of Statistical Science at Academia Sinica, the system relies on a large number of data sets. “You can imagine that we are finding needles in a haystack,” he says.

According to Ing, researchers first select variables based on what’s called an orthogonal greedy algorithm [pdf]. (A greedy algorithm makes the best choice from among its nearest options in the hope that this will lead to the best overall choice.) They use the algorithm to decide which variables are irrelevant to the process, simplifying the analysis.

“After the heavy burdens of numerous calculations were lifted, we eventually can conduct analysis precisely,” Ing says.

Ing describes the process with an analogy. Detectives first rank suspects of committing a crime—in this case, wafer damage; then they eliminate the impossible ones based on the model’s prediction. Finally, the detectives examine the remaining suspects in order  to find the one responsible. “What are left are the criminals,” says Ing, who has, since 2011, teamed up with a group at Stanford University, led by statistician Tze-Leung Lai, that uses a similar model.

Chung-Liang Chien, deputy minister of Taiwan’s Ministry of Science and Technology, sees benefits beyond chip manufacturing. Ing’s team is expanding the application of the model to other fields, including environmental monitoring. And  “in terms of disease diagnosis, the big data analysis can be applied to the identification of genes causing cancer,” Chien says. “We’ve encouraged more researchers to work on medical and biological problems.”