System Ingests AT&T Network Logs to Reveal Root Cause of Errors

By analyzing millions of error messages in AT&T’s network data, researchers developed an algorithm that could help carriers detect problems faster

3 min read
Illustration of a large number of phones with errors
Illustration: iStockphoto

Behind the easy connectivity that much of the world enjoys, commercial networks are hard at work establishing connections, authenticating users, and verifying services. When an error occurs, it can be hard for providers to pinpoint the root cause because an error message may be generated in a different spot within a network than the place where the actual error happened.

To hone in on the source of such errors, researchers have analyzed error logs related to millions of messages exchanged through AT&T’s network. The group’s aim was to learn about latent events in particular. Latency errors may cause delays in call propagation and transmission, disconnection issues, and network bottlenecks. Each error event can produce a sequence of messages whose type and frequency could vary based on the latency between the various network elements, network load, and other events.

“We have come up with a set of algorithms that can group the raw error data into events described by important keywords,” says Siddhartha Satpathi, a PhD candidate in electrical engineering at the University of Illinois at Urbana Champaign. “We are not identifying the cause of the events, we are simply separating the messages into groups, where each group consists of messages generated by a single event. Additionally, we identify the key messages which are associated with each event.” Then, a network operator can use these groupings to identify the root cause.

In a real network, Satpathi explains, errors that come from different geographical locations could be related to one another, and sometimes one physical error leads to thousands of error messages. He uses the example of Alice from Illinois who’s visiting California, making a phone call to Bob in New York. Before connecting the call, the base station close to Alice in California needs to verify her credentials, which are in her home station in Illinois.

Once that’s done, the call is routed through the network from California to New York. If a router breaks down somewhere along that network, it would result in error reports from all the connected networks and locations (California, New York, and Illinois). This group of error messages in the error log is what the researchers called an “event.”

That’s where the new algorithm comes in. The size of the error logs makes it impossible for a human engineer to go through the messages and figure out which ones were caused by the same event.

“Our algorithm groups these messages into few important events,” says Satpathi. “It also outputs some frequently occurring messages in these discovered events. This grouping of messages make the message log human interpretable, and can help an engineer decipher the root cause of the error.” The group recently published its work on network message logs in the journal IEEE/ACM Transactions on Networking.

In their research, Satpathi’s team considered comprised 97 million messages, of 39,330 types, sent over 15 days. These included syslog texts (raw-text messages generated by software associated with specific network elements, say a server, relay, or base station to a logging server, and which include a timestamp, and the message text describing the error) and alarms (which indicate specific fault conditions in a network element). The researchers then applied a two-stage algorithm, called Change-point Detection–Latent Dirichlet Allocation (CD-LDA), which uses the existing LDA algorithm as a subroutine, to this data.

The six hours that it took to run LDA on this dataset could be reduced, Satpathi says, by using faster versions of the LDA algorithm. This makes the study “very scalable,” he adds, for detecting errors on a commercial network.

The Conversation (0)

Metamaterials Could Solve One of 6G’s Big Problems

There’s plenty of bandwidth available if we use reconfigurable intelligent surfaces

12 min read
An illustration depicting cellphone users at street level in a city, with wireless signals reaching them via reflecting surfaces.

Ground level in a typical urban canyon, shielded by tall buildings, will be inaccessible to some 6G frequencies. Deft placement of reconfigurable intelligent surfaces [yellow] will enable the signals to pervade these areas.

Chris Philpot

For all the tumultuous revolution in wireless technology over the past several decades, there have been a couple of constants. One is the overcrowding of radio bands, and the other is the move to escape that congestion by exploiting higher and higher frequencies. And today, as engineers roll out 5G and plan for 6G wireless, they find themselves at a crossroads: After years of designing superefficient transmitters and receivers, and of compensating for the signal losses at the end points of a radio channel, they’re beginning to realize that they are approaching the practical limits of transmitter and receiver efficiency. From now on, to get high performance as we go to higher frequencies, we will need to engineer the wireless channel itself. But how can we possibly engineer and control a wireless environment, which is determined by a host of factors, many of them random and therefore unpredictable?

Perhaps the most promising solution, right now, is to use reconfigurable intelligent surfaces. These are planar structures typically ranging in size from about 100 square centimeters to about 5 square meters or more, depending on the frequency and other factors. These surfaces use advanced substances called metamaterials to reflect and refract electromagnetic waves. Thin two-dimensional metamaterials, known as metasurfaces, can be designed to sense the local electromagnetic environment and tune the wave’s key properties, such as its amplitude, phase, and polarization, as the wave is reflected or refracted by the surface. So as the waves fall on such a surface, it can alter the incident waves’ direction so as to strengthen the channel. In fact, these metasurfaces can be programmed to make these changes dynamically, reconfiguring the signal in real time in response to changes in the wireless channel. Think of reconfigurable intelligent surfaces as the next evolution of the repeater concept.

Keep Reading ↓Show less