The December 2022 issue of IEEE Spectrum is here!

Close bar

IBM’s New AI Tool Parses A Tidal Wave of Coronavirus Research

Deep Search uses an advanced cloud-based natural programming language tool

4 min read
A man holding a marker in front of mathmatics.
IBM researcher Peter Staar, an IEEE member, built the Deep Search platform.
Photo: IBM Research

THE INSTITUTE In the race to develop a vaccine for the novel coronavirus, health care providers and scientists must sift through a growing mountain of research, both new and old. But they face several obstacles. The sheer volume of material makes using traditional search engines difficult because simple keyword searches aren’t sufficient to extract meaning from the published research. This is further complicated by the fact that most search engines present research results in visual file formats like pdfs and bitmaps, which are unreadable to typical web browsers.

IEEE Member Peter Staar, a researcher at IBM Research Europe, in Zurich, and manager of the Scalable Knowledge Ingestion group, has built a platform called Deep Search that could help speed along the process. The cloud-based platform combs through literature, reads and labels each data point, table, image, and paragraph, and translates scientific content into a uniform, searchable structure.

The reading function of the Deep Search platform consists of a natural language processing (NLP) tool called the corpus conversion service (CCS), developed by Staar for other information-dense domains. The CCS trains itself on already-annotated documents to create a ground truth, or knowledge base, of how papers in a given realm are typically arranged, Staar says. After the training phase, new papers uploaded to the service can be quickly compared to the ground truth for faster recognition of each element.

Once the CCS has a general understanding of how papers in a field are structured, Staar says, the Deep Search platform presents two options. It can either generate simple results in response to a traditional search query, essentially serving as an advanced pdf reader, or it can generate a report on a specific topic, such as the dosage of a particular drug, with deeper analysis that the group calls a knowledge graph.

“[The] knowledge graph allows us to answer these relatively complex questions that are not able to be answered with just a keyword lookup,” Staar explains.

To keep the data in the platform’s knowledge base up to the highest standards possible, Staar says the team bolsters their corpora with trusted, open-source databases such as DrugBank for chemical, pharmaceutical, and pharmacological drug data and GenBank for established and publicly available data sequences.

SEARCH HISTORY

Deep Search is based on a similar platform that Staar built in 2018 for material science and for oil and gas research, fields that both faced a deluge of data. Staar recognized that the same solution could be used to parse the tsunami of data about SARS-CoV-2. The platform was designed to be generic enough to be extended to other domains of research.

“Our goal was to help the medical community with a tool that we already had in our hands,” Staar says. Currently, the COVID-19 Deep Search service supports 460 active users and has ingested nearly 46,000 scientific articles.

The platform can even use search queries to divide results according to scientific camp.

“In the oil and gas business, when different philosophies [on environmental impact] collide, you can say, ‘Okay, if you follow a certain stream of thought, then you might be more interested in papers that are associated with this group of people, rather than with that group,’” Staar says.

If the scientific community is divided on a major attribute of SARS-CoV-2, for example, Deep Search might cluster search results around each camp. When a user searches for that attribute, the platform could analyze the wording of their search string and then guide the user to the cluster of results that most closely aligns with the user’s approach.

STREAMLINING RESEARCH

This isn’t the first time a pressing global health crisis has prompted scientists to try to streamline the publishing process. A 2010 analysis of literature from the 2003 SARS outbreak found that, despite efforts to shorten wait times for both acceptance and publishing, 93 percent of the papers on SARS didn’t come out until the epidemic had already ended and the bulk of deaths had already occurred.

Unlike their counterparts in 2003, however, present-day epidemic researchers have benefitted from the advent of preprint servers such bioRxiv and medRxiv, which enable uncorrected articles to be shared digitally regardless of acceptance or submission status. Preprints have been around since the early 1990s, but the public health emergency of SARS-CoV-2 prompted a new surge in popularity for the alternative publishing practice, as well as a new round of concern over its impact.

Deep Search capitalizes on the preprint trend to further reduce obstacles to sharing the content of research papers. But it also aims to address one of the chief criticisms of preprints: that without peer review, the average reader may be unable to distinguish high-quality research from low-quality research. Though every new paper has equal weight in the Deep Search algorithms, the volume of data it ingests allows for statistical comparisons among conclusions. Users can easily see whether a result is consistent with previous findings or seems to be an outlier.

These relational functions, in which Deep Search sorts, links, and compares data as it returns results constitute the platform’s signature advantage, Staar says. Developing a treatment molecule, for example, might start with a search to determine which gene to target within the viral RNA.

“If you understand which genes are important, then you can start understanding which proteins are important, which leads you to which kinds of molecules you can build for which kinds of targets,” he says. “That’s what our tool is really built for.”

The Conversation (0)

Get unlimited IEEE Spectrum access

Become an IEEE member and get exclusive access to more stories and resources, including our vast article archive and full PDF downloads
Get access to unlimited IEEE Spectrum content
Network with other technology professionals
Establish a professional profile
Create a group to share and collaborate on projects
Discover IEEE events and activities
Join and participate in discussions

Economics Drives Ray-Gun Resurgence

Laser weapons, cheaper by the shot, should work well against drones and cruise missiles

4 min read
In an artist’s rendering, a truck is shown with five sets of wheels—two sets for the cab, the rest for the trailer—and a box on the top of the trailer, from which a red ray is projected on an angle, upward, ending in the silhouette of an airplane, which is being destroyed

Lockheed Martin's laser packs up to 300 kilowatts—enough to fry a drone or a plane.

Lockheed Martin

The technical challenge of missile defense has been compared with that of hitting a bullet with a bullet. Then there is the still tougher economic challenge of using an expensive interceptor to kill a cheaper target—like hitting a lead bullet with a golden one.

Maybe trouble and money could be saved by shooting down such targets with a laser. Once the system was designed, built, and paid for, the cost per shot would be low. Such considerations led planners at the Pentagon to seek a solution from Lockheed Martin, which has just delivered a 300-kilowatt laser to the U.S. Army. The new weapon combines the output of a large bundle of fiber lasers of varying frequencies to form a single beam of white light. This laser has been undergoing tests in the lab, and it should see its first field trials sometime in 2023. General Atomics, a military contractor in San Diego, is also developing a laser of this power for the Army based on what’s known as the distributed-gain design, which has a single aperture.

Keep Reading ↓Show less