Taming the Power Grid

28 min read

Recent newsworthy wide-area electrical blackouts have raised many questions about the specifics of such events and the vulnerability of interconnected power systems when operated outside of their intended design limits.

Exchange of information stemming from worldwide blackout findings, restorative efforts, and innovations in technology shed new light on the current conditions, procedures, regulations, and design of power systems. Examination of the root causes, the resulting effects on neighboring systems, and implementation of proven solutions to help prevent propagation of such large-scale events should help us design reliable power delivery infrastructures for today and in the future. Armed with this detailed and fresh prospective, power industry professionals can consider the costly lessons of the past, maintain a library of historical lessons about "What and why it happened?" for generations to come, and act as catalysts to help design or revise power systems to a heightened reliability.

Although large-scale blackouts are very low probability events, they carry immense costs for customers and society in general as well as for power companies. It is easy to misjudge the risk of such extreme cases, and in particular the financial risk. Financial risk is the product of the associated cost and the probability of occurrence, and both factors are very hard to assess accurately. The need for extensive mitigation strategies against grid congestions and the high cost associated with such improvements, combined with inaccurate probabilistic assessments have led to risk management not focusing on appropriate, cost-effective mitigation actions. From a broader prospective, a misconception may be formed about the grid reliability or its exposure to large-scale outages.

Understanding the complexities of the interconnected power grid and the need for proper planning, good maintenance, and sound operating practices are key to preventing the problems of tomorrow for this modern-day necessity. This article offers practical explanations by experienced power industry navigators (from utilities and vendors, consultants, and academics, all with international reputations) on the leading causes of widespread blackouts and how best to prevent them in order to craft a steady course along the journey toward higher levels of reliability in future power generation and delivery.

Challenges and opportunities in taming the grid's wide area blackouts

Our lives continue to be improved by evolutions in technology, which include precision surgical equipment for use in critical operations; the revolution of information exchange through Internet and wireless technology that affects every aspect of our personal and professional lives; automatic banking anytime of the day; use of electric rail systems to reduce harmful emissions; and improved home appliances. Thanks to affordable costs and marketing concepts, many of the technological innovations achieved in the past quarter century have readily found their way into our daily lives.

The modern-day amenities and our respect for the environment have also increased our dependence on energy, hence our expectations for uninterrupted reliable power. Twenty-first-century equipment is entering our homes. Robotic appliances that perform all household duties are a reality within reach. Imagine: we arrive home and nothing is done due to unavailability of the electricity. That is if we manage to get home due to traffic jams caused by traffic lights not working or the rail system not running.

Modern technology is the catalyst driving power delivery, demanding grid reliability, and the marked increased dependence on availability has raised the bar on human expectation. However, the demand for the availability of power for much of the modern-day equipment has not been systematically and uniformly considered.

Let us consider our willingness to pay the price for availability. We are willing to pay more for a laptop computer with rechargeable backup battery than for a desktop computer so that we have a computer available when traveling. One can add the costs and environmental impact for discharging the batteries to further emphasize the price we are willing to pay for availability. Another example are hybrid automobiles where the price for clean-air vehicles continues to drop; yet we are not eager to use them as mileage between fueling (availability) is not as good as with regular cars or it takes much longer to charge up as opposed to gasoline fueling. The above analogy can be applied to power systems as well. There is a price for availability, and one can apply a fraction of the price difference to everyday conveniences we have become accustomed to in order to realize the hefty price we would be paying when availability becomes top priority.

The North American and the European grid systems that experienced blackouts in 2003 are among the most reliable systems worldwide. However, the same systems are subject to a host of challenges: aging infrastructure, need for generation sitings near the load centers, transmission expansion to meet growing demand, and regulatory pressures.

One of the challenges facing the power industry today is the balance between reliability, economics, the environment, and other public-purpose objectives to optimize transmission and distribution resources to meet the demand. These issues must be addressed to move the electrical system into the 21st century.

Resources and transmission adequacy are necessary components of a reliable and economic supply. Although reliability and market economics are sometimes driven by conflicting policies and incentives, they cannot be separated when the objective is reliability and availability. Today, grid planning faces an extremely difficult task given the challenge to achieve resource adequacy in our restructured industry, as market economics and local concerns often drive the decision for generation facility siting far away from major load centers.

Equally difficult is planning for an adequate transmission system when the location of future generation facilities is uncertain and the lead time for transmission construction is several times greater than that of the generation siting process and implementation.

It is more important than ever to find ways to project transmission and distribution growth, identify cost-effective solutions to deploy, and to determine criteria to be applied to guide prudent investment decisions. Some of the key areas to address are:

The need for regulatory bodies to step up and address matters such as defining and enforcing the standards for reliability, streamlining the right-of-way access for transmission, vegetation management vs. environmental impact, and the recovery on stranded investments to name a few items.

The price for reliability, the costs and risks that transmission owners and customers are willing to assume. The power industry is accustomed to optimizing investments and evaluating return on investments based primarily on financial aspects of trading energy and serving load within certain reliability criteria. This is done without considering financial aspects of unavailable energy (from undue service interruptions) due to low reliability and slow restoration that incurs significant costs to society, as recent blackouts have shown. This is an incomplete financial model that results in sub-optimal investment strategies.

Large regional geographic areas should be included in the scope of transmission planning and decision-making. Identify the true beneficiaries and how costs are to be shared.

Quick restoration. It is incomprehensible with today's technology to accept power restoration lasting in excess of 12 hours.

Electricity is the key resource for our society; however, it has not been a priority for strategic planning. Cities, households, and industries will all suffer if the approach does not change and the identified major action plans are not implemented.

Pre-outage conditions and symptoms of blackouts

The grid is a tremendously complex system, and the interconnections that allow us to benefit from higher reliability and lower costs have also caused the domino failures experienced in many parts of the world in recent years. Although there is a tendency to point at one or two significant events as the main reasons for triggering cascading outages, major blackouts are typically caused by a sequence of multiple, low-probability contingencies with complex interactions.

Low-probability sequential outages are not anticipated by system operators or may develop too fast for human interactions, thus rendering the power system more susceptible to wide-area blackouts. As the chain of events at various locations in the interconnected grid unfolds, operators may not be able to act quickly enough to mitigate fast-developing disturbances. Operators are exposed to a flood of alarms and, at times, incomplete information. There are many factors to consider when human actions are expected, such as:

Simply making a mistake. All humans do.

A desire to simply allow a little bit longer time in case the system recovers automatically.

Concerns surrounding taking unpopular actions such as disconnecting customers (may later be subject to a line of questioning or proposed training if these actions were inadequate or exceeded the required response).

Not having the authority to take the necessary actions.

Power systems are designed to allow for reliable power delivery in the absence of one or more major pieces of equipment such as lines, transformers, or bulk generation, commonly referred to as contingency conditions. For example, North American Electric Reliability Council (NERC) Planning Standard IA sets forth the performance requirements a system must meet for various contingencies. The complexity of the grid operation, however, makes it difficult to study the permutation of contingency conditions that would lead to perfect reliability at reasonable cost. An accurate sequence of events is difficult to predict because there is practically an infinite number of operating contingencies. Furthermore, with system changes--e.g., independent power producers selling power to remote regions, load growth, new equipment installations that cause significant changes in power flow--these contingencies may differ significantly from the expectations of the original system designers. There have also been cases of system disturbances caused by scheduled equipment outages when the electrical system has not been adjusted, for continued safe operation, prior to the equipment being removed--again pointing to the complexity of the power grid.

History has showed us that both unscheduled and scheduled outages have affected power systems' balanced operation, hence signifying the grid complexity during managed conditions. In the case of the August 1996 North America disturbance, a series of equipment was being removed for maintenance in parts of the Western grid when the weather was moderate, yet these pieces of equipment were needed to support transfers in other parts of the Western grid, which was having extremely high temperatures. With the September 2003 disturbance in Europe, several 220- to 400-kilovolt (kV) lines were out of service for maintenance prior to the event. Generally, disturbance propagation involves a combination of phenomena such as:

Cascading line tripping by overloading transmission lines.

Cascading outages that lead to equipment tripping (for example, generators and transformers), contributing further to systemwide outages.

Power system islanding (frequency instability) when the power system separates. Islands are formed, with an imbalance between generation and load, causing the frequency to deviate from the nominal value, leading to more equipment tripping.

Loss of synchronous operation among generators and oscillatory instability, causing self-exciting inter-area-oscillations.

Voltage instability/collapse problems that usually occur when the power transfer is increased because local resources have been displaced by remote resources without the proper installation of needed transmission lines or voltage support devices in the "right" locations.

While wide-area blackouts may appear unpreventable, there are many proven measures that can mitigate their frequency of occurrence and their impact. Wide-area cascading outages could be detected and mitigated in a timely manner by implementing a number of measures both locally and via information from remote locations.

Recent increases in frequency and size of blackouts Within the last two years, the number of wide-area outages has rapidly increased (blackouts in the Northeast United States and Canada, Italy, Sweden, Denmark, England, Croatia, India, Australia, New Zealand, Greece, etc.), affecting more than 130 million customers. Figure 1 shows some of the previous widespread blackouts and their consequences. As the likelihood of low-probability events escalating into a cascading outage increases, when the grid is already under stress due to pre-existing conditions, one can conclude that power grids are more prone to disturbances than ever.

The recent blackouts have prompted a barrage of international media attention and have served as catalysts in propelling the power industry toward analyzing blackouts and finding solutions to prevent them. After the 2003 blackouts in North America (see Figure 2) and Italy, industry experts have appeared in the spotlight. More than one year after the 14 August blackout in North America, although media attention has significantly declined, the debate is still raging regarding why blackouts happen and whether it is possible to prevent them. And, if possible, how?

By some accounts:

These kinds of outages are consistent with historical statistics and will continue to happen. Some people may compare blackouts to events such as long-term weather forecasting (of hurricanes) or to natural disasters (tsunamis, mudslides and earthquakes) as being difficult to predict and prevent.

Sloppy or infrequent vegetation management, as well as improper or inadequate operator response, are actors in a bigger drama. Massive blackouts are inevitable and much like megaquakes will some day level parts of the word.

The more immediate problem may be the industry's investment of less than 0.5 percent in research and development (R&D), one of the lowest rates for any industrial sector.

A power system is composed of hundreds of thousands of pieces of equipment, from bulk autotransformers and high-voltage transmission systems to light bulbs. It has been suggested that one could not get a computer big enough to model a complex system--the Eastern Interconnection, for example--and perform the planning studies.

Large blackouts occur because the grid isn't forcefully engineered to prevent them. Purposely weakening the grid can reduce large blackouts but would increase the frequency of smaller ones.

This paper comments on the above topics and recommends practical solutions to better harness the seemingly untamed grid.

Chifong Thomas, principal transmission planning engineer at Pacific Gas and Electric Co. and the chair of the Technical Studies Subcommittee of the Western Electricity Coordinating Council (WECC), points out that if large blackouts occur anyway, as they are as hard to predict as earthquakes, then nothing anyone could do would make any difference. That contradicts the premise that industry underinvestment in R&D is the immediate problem. If the first premise is true, the second one is irrelevant.

Thomas also points out that if grid planning is done comprehensively and correctly, then operators with proper training should have time to respond to contingencies and/or to limit their impact.

Mayer Sasson, principal advisor, Electric Markets Policy Group at Consolidated Edison Co. of New York, highlights the vulnerability of the interconnected grid by pointing out that it may take only one member of a control area to be operating outside of the reliability limits for any reason, including unintentionally, to cause a cascading outage that can quickly propagate to neighboring interconnected system-control areas. This underscores the criticality of complying with mandatory reliability rules.

Bogdan Kasztenny, power system protection and control application manager at General Electric Co. (GE), Canada, raises the important issue of human factor involvement in the last few critical hours before any major event. Says Kasztenny: "Even in a poorly designed and underinvested system, operators could sometimes salvage an event that seems to be disastrous, or collapse an otherwise quite secure situation in a strong and reliable system. It is not so with earthquakes. Power systems are manmade creations run by humans. Operational procedures, availability and accuracy of real-time information, and adequate training including dry runs on simulators are technical means that improve the response of the operators."

Statistically, a sequence of low-probability contingencies with complex interactions causing a blackout may not happen in the immediate future but will eventually take place. Although the possibility of widespread blackouts appearing in the future cannot be systematically ruled out, that does not mean that their occurrence cannot be reduced, their propagation cannot be arrested (or size or consequences reduced), and restoration sped up. Recent increases in the number and size of blackouts raise questions as whether it is possible to predict if, where, and when the next ones will occur. Event analyses of the wide-area disturbances clearly show that power systems are more stressed than in the past, that capacity reserves have been compromised in some form or fashion, that parts of the interconnected systems are operated outside of limits, or that some reliability measures are compromised, prior to the start of the chain of events.

Kasztenny emphasizes that the power system is a complex generating, transmitting, and distributing system for a medium that cannot be stored, or buffered, in the reality of very limited redundancy. This makes it quite different from banking, phone, or similar systems. Other systems are subject to brownouts. However, the power system must balance the medium under physical constraints, and is therefore subject to collapse if the constraints are violated. Kasztenny says the industry needs to invest to:

Make the power grid stronger (by adding generation, transmission paths, reactive power support, etc.) and, consequently, limit the stress due to the developing problem.

Make the components more reliable (with proper maintenance, correctly set and coordinated protection settings, etc.) and, consequently, limit the probability of failing under stress.

Make the system defend itself better (through equipment and power system protection schemes) and, consequently, divert and reduce the stress by taking automatic remedial actions.

Power system modeling and analysis

Electrical grids have been characterized as the most widespread interconnected, complex, dynamic systems made by human beings. A power system carries a tremendous amount of electricity that all of us depend on. Although blackouts are difficult to predict and prevent, the notion that it is not possible to simulate the grid's behavior, to identify and address most vulnerabilities, is not accurate.

Carson Taylor, a power system simulation expert with the Bonneville Power Administration, confirms that today's technology allows for detailed modeling of complex power systems. Taylor emphasizes that the Eastern Interconnection has been simulated in great detail with 40 000+ bus models. Simulations with 100 000+ models are feasible. Feasibility increases with IT advancements. Computation is not a large problem. It is scalable by parallel computation on multiple servers; cases can be farmed out to multiple servers. The simultaneous processing method is used for energy management and system dynamic security assessment.

Prof. Göran Andersson, of the Eidgenössische Technische Hochschule (ETH) in Zurich, a noted expert in power system dynamics, concurs with Taylor and brings up an important issue that the challenge is not technical--i.e., related to modeling or computer capacity. Rather, the challenge is data management and its interpretation. Andersson also underlines that "statistical methods can give some information and insight concerning the sizes and frequencies of blackouts--which is of value. However, the problem is the calibration of the frequency scale, since this requires a detailed modeling of the interactions. The statistical insights, however, do not provide to any large extent the guidance for avoiding blackouts to the power-system professionals operating the system. In terms of modeling, the details of the system and the interactions between different components and subsystems are indispensable."

Both Andersson and Taylor emphasize that the best practice is to continually improve and update models and compare simulations with real power-system response. This is not unique for power systems and is continuously being done in industry and academia. In addition, much progress has been made over the years, and even if predictions from simulations are problematic, efforts to model, simulate, and validate performance provide invaluable insight for reliable power system operation.

Paul Myrda, director of operations at Trans-Elect Inc., based in Reston, Va., points to the "Computing the Cosmos" article [IEEE Spectrum, August 2004] which headlines the 4 200 000 000 000 (4.2 trillion) calculations per second the Virgo Consortium supercomputer can perform. Myrda finds it incomprehensible and typical of the pervasive attitude within the industry that focuses more on what cannot be done rather than what can and should be done. We have the computational capability to solve the origins of the cosmos, yet haven't studied the power grid.

In conclusion, the power grid can be modeled and studied. However, as there is an infinite number of contingencies that can occur and the current state is not precisely known, it is not possible to exactly predict disturbance propagation far in the future. However, innovations in power-system tools, from planning to monitoring to operation, would help us meet the challenges of 21st-century expectations in reliable power delivery.

Power systems operating outside limits

It has been demonstrated time and again that wide-area blackouts are caused by operation of the interconnected power system outside operating limits or for operating conditions that have not been thoroughly studied. One of the best ways to understand the challenges of power delivery is to liken it to driving the highways in a car. As drivers, most of us have experienced a wide variety of unplanned difficulties and challenges that slowed our travel. In many ways, widespread outages have characteristics similar to highway traffic gridlock. Some of the key aspects of traffic jams are:

Not enough lanes to accommodate growing demand.

As one or two lanes are closed, traffic flow is significantly impacted and in the most severe case of an accident the highway is shut down. This is circumstance out of the control of those not involved with the accident, yet they are stuck (blackout).

A properly designed system would include major alternate freeways that are easily accessed through detours to minimize the impact of the lane closures.

Repairing old highways may not be enough to solve traffic congestion. At times prudent investment in building new traffic lanes, or completely new thruways, and innovative traffic control systems are needed. Daniel Karlsson, a system analysis and protection expert at ABB, in Sweden, also compares the expansion of highway traffic with an expansion of the power grid, as both have grown from minor systems to extremely important infrastructures in the same time period, and both continue to grow. However, he points out that the public accepts a much higher degree of failures (e.g. people seriously injured or worse) due to automobile traffic than they accept a widespread power outage, demonstrating the important role of electricity in our daily lives and hence the criticality of reliable power delivery.

Recent blackouts worldwide have caused us to look afresh at our assumptions about the value of reliable electric service and what it takes to keep the lights on. It must be emphasized that the original function of interconnected systems was to form the backbone for the security of supply, and to reach its required high-reliability level at reasonable costs. To this aim, the grid systems were developed in the past several decades with a view to assuring mutual assistance between transmission system participants and/or national subsystems, including common use of reserve capacities, and, to some extent, using sound fundamental rules of electricity to optimize the use of energy resources by allowing exchanges between the systems. This original function has been changed by the recent emphasis on deregulation.

Karlsson brings up a historical perspective to explain how the large interconnected power systems of today have been formed, due to constant demands on capacity, economy, and reliability. "Initial power systems were small grids with low capacity and low reliability," he notes.

Extensions emerged as dictated by capacity needs. To improve reliability of individual systems by allowing support from the neighbors, these systems were interconnected. New phenomena appeared, such as transient instability, resulting in studies and actions to counteract them.

Complex electrical power systems introduced new transmission capacity problems and actions to counteract them (series capacitors inserted in the transmission line, shunt capacitors and reactors, automatic tap changers to control transformer voltage, etc.). Consequently, this brought new phenomena like sub-synchronous resonance and voltage instability and, again, a need for new studies and equipment, such as flexible ac transmission systems (FACTS) and high-voltage dc (HVDC) links.

"The complexity of the present power system and factors such as the environment and rights of way, governmental concerns, and cost-to-benefit evaluation for large-scale investments will drive us to maximize the use of current assets without truly addressing the rising demand for new infrastructure," says Karlsson. "This trend will continue to challenge the industry with new technology and actions: superconductivity, energy storage, micro-grid, etc."

With the ever-growing demand for power, our industry tends to push the limits challenging reliability with a question: "Where is the edge?" This is particularly true in places such as North America, where grid expansion has been minimal or not rapid enough to meet the demand.

We also note that weakening or splitting the grid on purpose during normal operation is not a sound alternative, as this strategy would make a full circle for the grid. For example, by operating the power grid in separate islands one could easily weaken the power system. That would effectively mean coming back to the original design of the small grid with low capacity.

Separating the grid into islands would also impact deregulation. Each DG would be selling power to its own neighborhood, for example. The small utility would soon need occasional support for reserve margin, voltage, and reactive margins from neighboring systems. Hence, back to the interconnected grid. The solution is not in separating the grid into islands, but rather to resolve transmission problems to mitigate the potential for widespread cascading outages.

Chifong Thomas, of Pacific Gas & Electric, dismisses the argument that large blackouts occur because planning engineers spend too much time preventing small blackouts. She argues that this theory ignores the fundamental fact that large blackouts start out as small problems that do not by themselves necessarily lead to blackouts. They lead to blackouts when the small problems are not corrected in time.

Prof. Vijay Vittal, of Iowa State University, agrees that the impact of severe system failures could be mitigated by several approaches such as using corrective controls or performing more effective analyses closer to real time. The effective way to minimize disturbance propagation is to truly understand the common causes and design the appropriate solutions. The system needs to be addressed as a whole, with various planning, operations, maintenance, and regulatory measures implemented in a coordinated way.

A possibility for preventing propagation of the disturbance throughout the interconnected grid, but not weaken the grid during normal operation, is to design the interconnected power system to allow for intentional separation into stable islands or interrupt small amounts of load only when the system experiences major disturbances. As operators may not be able to act fast enough to take into account all data related to the online state of the system, separation actions should be performed automatically. Automatic schemes to disconnect (shed) load for unacceptable underfrequency and undervoltage system conditions and power system protection schemes (PSPS), also referred to as special protection schemes (SPS) or remedial action schemes (RAS), already serve such functions. RAS detect abnormal wide-area system conditions and trigger automatic actions to restore acceptable system performance.

Underfrequency load shedding (UFLS) has been widely implemented in the industry. Undervoltage load shedding (UVLS) and RAS have also been implemented worldwide with success (e.g., the WECC Western Region in North America, Hydro Quebec in Canada, EDF in Europe, parts of Eastern Europe, and in Japan). The initiating factor in implementing a significant number of RAS in the WECC has been to better protect the grid against multiple contingencies, particularly after the 1994 and 1996 blackouts.

Likewise, RAS implementation, combined with use of local reactive support and voltage control safety nets, should be further studied and more extensively implemented in the eastern part of North America and in other grids around the world impacted by recent blackouts or prone to potential blackouts. Designing the grid with appropriate measures for voltage-control and advance-warning systems, such as wide-area protection and control, would allow for both strong interconnected grids during normal operation (to make the system more reliable and secure) and the creation of predetermined islands only when necessary.

Instead of weakening the grid, the power industry needs to address deregulation as one important aspect in understanding the underlying causes of systemwide outages. The bulk power system was often not originally designed to transfer large amounts of power between neighboring systems. Individual power systems were interconnected to improve electrical network reliability by enabling neighboring utilities to support each other during stressed conditions.

In recent years, deregulation has imposed additional requirements of high transfers from new generation sources to the load areas. At the same time, public pressures and the "not in my backyard" sentiment make it difficult to site transmission lines or major local generation sources, especially in the more densely populated heavy load areas, making system expansion very expensive and difficult. As recent disturbances indicate, the above has resulted in grids more vulnerable to blackouts (e.g., voltage-stability problems).

At the 2004 international symposium on bulk power system dynamics and controls in Italy, the history of interconnected Europe was reviewed and the following was identified: In the 1950s the power system professionals foresaw the importance of electricity to customers. Thus, the strategy of interconnecting neighboring systems to improve reliability and security margins became a reality. Coordinated rules for the mutual support of interconnected systems were defined and adopted by the power pool members. Since the late 1970s, however, the electrical transnational infrastructures have been more and more exploited for energy exchanges that take advantage of the different production costs of electricity in the various nations.

Deregulation efforts in North America have followed a comparable pattern for similar reasons. The high level of power exchanges in today's energy market is technically being provided outside the scope of the original system design. The higher demand, coupled with low-level investments in technology and infrastructure upgrades and capacity increase, has led control area operators to run the system close to the edge, as close to the limits as permitted by the reliability criteria and, sometimes, beyond the limits.

Learning from the past

Studying the cascading grid disturbances in the past decade demonstrates the importance of multidisciplinary involvement. In addition, one ponders the reasons for the continued and frequent nature of such events. Approximately 12 million customers were affected in 1994 and 1996 in the western region of North America. The authors believe that lessons learned from these outages, if applied in the eastern part of North America or in Europe, would have helped prevent or minimize the more recent outages.

After all, there are investigative teams that prepare comprehensive reports of findings and recommendations. The 14 December 1994 blackout investigation report had 38 conclusions and made 28 recommendations. Teams that investigated the 1996 blackouts in North America ultimately had a combined list of 130 conclusions and 54 recommendations.

Investigating teams following the blackout on 14 August 2003 identified 60 recommendations (14 by the North American Reliability Council, or NERC, and 46 by the U.S./Canadian Task Force).

The Union for the Coordination of Electricity Transmission (UCTE) report lists 14 observations for the 28 September 2003 outage that are very similar to those identified for the outages in North America. The similarities in findings between the 1994 and 1996 investigative reports, those listed in the UTCE report in continental Europe, and the August 2003 North American blackout analysis show [ Figure 3] that if the 1994 and 1996 recommendations had been applied in other parts of the world, the impacts of the 2003 outages would have been immensely reduced.

Findings from the worldwide disturbances highlight the need for a concentrated and committed reform process across regions or inter-area systems. Challenges must be solved in a timely manner, given the available technology, and not delayed due to legislative or governmental processes. The reform process should allow the efforts to focus on innovative solutions, recognizing that all interconnected systems will experience some level of blackout when operated outside the intended design limits, similar to those that have experienced unintended widespread outages.

Below are some of the challenges that should be addressed:

Environmental and political factors limiting addition of new generation and transmission capabilities.

Efficient and timely recovery of investment and optimal asset utilization.

Enforceable reliability requirements, such as planning standards for normal and emergency conditions.

Comprehensive understanding of power system behavior and the complex interactions resulting from major disturbances.

Availability of reserve capacity for use during emergency conditions.

Increase in transfer limits should be accompanied by proper investments in infrastructure utilizing the latest 0technologies.

The recent blackout events underscore the need for investing in efforts to identify and deploy measures to reduce the frequency of occurrence and the impact of severe contingencies.

Blackout prevention

The alarming increase in the number of major blackouts requires exploring new frontiers in deployment of well-defined and coordinated overall plans (planning, operations, and maintenance). As analysis of recent disturbances reveals some common threads among them, the conclusion is that propagation can be arrested and impact of disturbances reduced when knowledge gained is properly utilized. The three T's--trees, tools, and training--have been identified as the leading focus areas to prevent widespread outages not caused by natural disasters.

While tree trimming can help reduce system exposure, other natural events such as storms or dense fog have caused similar propagated disturbances in the past. Tools and training, on the other hand, are two factors requiring human interaction during fast-developing, cascading events. Hence, although three T's are very important, to prevent blackouts regardless of the initiating events or the level of human involvement other areas need to be addressed to arrive at comprehensive solutions.

The best way to minimize widespread power system disturbances is to understand their leading causes. Study of blackout history shows that in each case the reliability standards have been violated in some fashion. That demonstrates the need for prioritizing more stringent compliance enforcement standards, as well as the need to invest wisely in new transmission facilities in the "right" areas--thus optimally increasing reliability of the grid--in new grid monitoring technologies and especially in the tools that make it simpler to manage day-to-day operations.

Electric reliability and efficiency are affected by four segments of the electricity value chain: generation, transmission, distribution, and end use. Satisfactory system performance requires investments in all four segments. Increasing supply without improving transmission and distribution infrastructure in the right locations, for example, may actually lead to more serious reliability issues.

The timely retirement and replacement of transmission equipment at the end of its useful life is another important remedy for reducing failure rates and potential outages in the future. Aside from aging infrastructure concerns, the transmission grid must be upgraded and expanded to meet growing demands. For example, high-voltage power electronic devices allow more precise and rapid switching for improved system control and to help increase the level of power transfer that can be accommodated by the existing grid. Distributed energy technologies, if properly applied, could also play a role in relieving power flow demands on the transmission networks.

In conclusion, while new investments should certainly include some new transmission lines, they should also encompass power-delivery technologies such as series capacitors, single-phase operation of transmission lines, FACTS, HVDC links, energy storage, super-conducting materials, and micro-grids.

Aside from legislative and governmental commitments, compliance with reliability requirements and state energy plans are two of the large-scale challenges. Our industry must tackle ways to promote small generator plants in high-energy-use areas, assess new technologies, support customer-owned generation, and promote unobstructed operating visibility among control areas. Measures such as computerized control and data acquisition, phase-shifting transformers, coordination mechanisms and electronic data exchange between operators are other alternatives that will improve the capability of the existing infrastructure and allow for a more robust power exchange.

Furthermore, reliable power system performance requires a balance of many critical components such as adequate reserve; real and reactive power margins; reliable real-time telemetry and status monitoring; real-time state estimation; and properly set, maintained, coordinated, and tuned protection and control systems. Academic, industry, and governmental initiatives are required to set and enforce the standards for voltage control and reactive power practice, to improve system modeling and the validations process, to enhance the operator-training curriculum, and to ensure that operators are assigned responsibility for taking actions to prevent disturbance propagation.

In terms of modeling, quantitative analysis is needed to validate models of generators, turbines, and the associated controls to match actual system oscillations and damping. Likewise, dynamic loading or stability impact on protection devices (designed to operate in faulted conditions such as tree contact) should be considered, and routine protection-coordination studies using accurate models regularly performed. There are also concerns associated with protection and control applications and settings when short-term market conditions for power transfers stress the equipment, risking equipment outage, Figure 4.

Software and hardware tools that improve real-time system monitoring, analysis, control, and protection, including better congestion tracking, visualization, and information sharing over a wide region, will help manage the grid more reliably and cost-effectively on a day-to-day basis, as well as in emergencies. For example, poorly recognized dynamic constraints can unnecessarily narrow operating limits, endanger reliability, and prevent optimal energy transactions, resulting in lost revenues. Real-time security analysis tools are becoming increasingly critical for daily operation to visualize critical stability boundaries and to determine stability operating limits based on actual conditions.

Some key opportunities for improvement include coordinated adaptive protection and control systems and wide-area monitoring with advanced warning systems as elements of a true wide-area protection and control system (WAPC). Figure 5shows an example. The technology advancements today promote the concept of the "smart grid," an integrated, electronically controlled power system that will offer unprecedented flexibility and functionality, and improve system reliability. The concept of the smart power-delivery system includes automated capabilities to recognize problems, find solutions, and optimize the performance of the system.

The Eastern Interconnection Phasor Project (EIPP), a U.S. Department of Energy initiative to help achieve the above using measurements synchronized through global positioning satellites (utilizing so-called phasor measurement units), is a step toward reliable operation of the interconnected grid.

Finally, re-examination of traditional planning, operating, system design, protection applications, and device settings will help improve system response so as to slow or limit the spread of cascading outages. The frequency and varying impact levels of recent worldwide blackouts have provided the power industry with opportunities and supporting information to:

Understand the reasons similar types of cascading events in different parts of the world have had different results or varying impact on end users.

Study the complex power system phenomenon to minimize propagation in future systemwide events, including validation of the system studies against actual power system performance. Accurate and user-friendly tools are required to study (regularly as system conditions change) the performance of the grid and protection and control devices during disturbed system conditions.

Highlight the needed support for regulatory measures to ease grid expansions, grid reinforcements, and well-established and measured reliability enforcement processes in a most cost-effective way.

Revisit existing operating practices and real-time data-exchange policies among control areas and assure that operators can and will take proper actions to mitigate the propagation of a disturbance.

Assure that operating capacity reserves and margins for transmission flows remain available to allow system adjustments during unintended multiple-contingency conditions.

Implement properly studied and designed automated power-system protection schemes (PSPS). shows system performance recordings for the August 1996 disturbance.

Implement a rigorous investment strategy by use of strategic asset management and related tactical (e.g., reliability and maintenance improvements) initiatives.

In conclusion, it is important to take a fresh and balanced approach to fixing the system as a whole by implementing various planning, operations, and maintenance measures and weighing the costs, performance impact, and risks associated with each measure. The power industry needs to evaluate how to operate and maintain the power system for the years to come to meet defined reliability objectives. Within the context of this forward-looking system overhaul, specific solutions to reduce the likelihood of outages can be addressed. There is no silver-bullet solution to preventing blackouts, but there are general measures than can and should be taken to minimize impact of widespread cascading disturbances.

Each entity needs to focus on further process improvement and standardization, and better asset utilization--all parts of overall asset management strategy. This is key to increased reliability and to protecting investments. Prudent capital investment in power-system infrastructure has to be based on stringent cost-benefit analysis to optimize investments. In addition, independent certification of technical systems and business processes can be an important element of assuring that proper actions have been taken, processes implemented, and investments made.

System restoration

Another critical step in minimizing the impact of widespread blackouts is the need for effective and fast power-system restoration. Returning equipment to service, followed by quick restoration of power to users, is of paramount importance and can significantly minimize consequences of further outages.

Today's technology can be used to our advantage for intelligent restoration. Some of the key elements for responsive restoration are:

Well-defined procedures that require overall coordination within the restoring area, as well as with neighboring electric networks.

Reliable and efficient restoration software tools that can significantly aid operators and area coordinators to execute operating procedures and to make proper decisions. This tool is a part of the Energy Management System (EMS) or SCADA that provides voltage, frequency, outage status, and other data.

Regular training sessions that assure effectiveness of the process. The sessions should include practice drill scenarios, which should incorporate regional reliability or governmental policy requirements. For example, there may be a time-delay requirement for load restoration after a bulk power system has returned to service, to allow the system to stabilize. There may also be critical loads, which must be given higher priority in restoration.

Today's technology allows us to design schemes to aid in quick restoration, a serious consideration for moving to 21st-century power delivery. Even if advanced tools and procedures are in place to speed up restoration, there are limits on how fast the system can be restored depending on the type and distribution of generation. After the August 2003 blackout in North America, it took considerable time to restore generation. Some of the units did not have so-called blackstart capabilities to be put in service immediately, and some units required a longer time to be put online with full power (e.g., nuclear units due to security, and steam turbines due to their ramp-up rates).

Also of equal importance are the type of load served, the system configuration, and the effects of connecting the load back to the network (cold load pickup or hot load pickup affects end-user restoration time). Although power to most of the cities during the Italian blackout in September 2003 were restored in five to nine hours, it took more than a day to restore power to Detroit and New York. Göran Anderson points out that in the 2003 Swedish-Danish blackout, the 400 kV grid was restored within two hours, most customers were connected within four hours, and the last customer was reconnected within six hours.

Mayer Sasson, of Consolidated Edison Co. in New York City, explains the slower pace for load restoration in New York. The city's low-voltage network loads are part of a highly meshed Network system that affords a very high degree of reliability against localized outages. However, under blackout conditions, when a network is to be restored, the Network is isolated into 100 to 200 megawatt portions that need to be re-energized separately, requiring a time-consuming and careful process so that the inrush does not provide a setback to the restoration effort.

As discussed, designing the power system to transfer power across large distances and not providing enough reactive power close to the load or building the accompanying transmission lines may have detrimental effects on power system operation. Similarly, designing the power system and not considering the effects on restoration efforts may have detrimental effects on the speed of restoration. In conclusion, restoration time and system security could be significantly improved by planning the generation mix and location, and by considering not only market factors but also incorporating in the financial model the value of reliable operation and faster restoration. This approach would result in the most favorable long-term investment strategies for everyone in the electricity value chain.


The team of industry experts who produced this article concludes with the following messages:

The power system is very complex and human-made. Our industry needs to keep planning, operating, and maintaining the grid as simply as possible. There is a general understanding of blackouts caused by natural disasters (earthquakes, tsunamis, hurricanes, etc.) and our inability to do too much about it. However, we expect systemwide outages created by human beings and/or not arrested due to suboptimal design (system being operated beyond its original intended design) to be easier to prevent.

There has been a marked increase in the number and frequency of major blackouts since 1994 caused by pushing systems closer to their limits and not taking preventive and corrective measures. Analysis of disturbances also reveals some common threads among wide-area disturbances--which leads us to recognize that propagation can be arrested and impact of disturbances and outages can be reduced.

The probability, size, and impact of multiple contingency wide-area blackouts can be reduced and the propagation can be seized when the system is adjusted accordingly and quickly.

Industry needs to:

Learn from the past and address why the number of systemwide events has increased.

Operate the power grid only where system conditions have been studied.

Implement specific solutions, such as automated power-system protection schemes that will reduce the burden on the operators, to reduce the likelihood of outages.

Take steps to reduce restoration times.

A balanced approach to fixing the system as a whole, and for years to come, requires a well-defined and coordinated overall implementation strategy, including various planning, operations, maintenance, and regulatory measures, while weighing the costs, performance, and risks associated with each measure. Governmental and power industry entities, management, and technical professionals need to recognize grid limitations and avoid subtly pushing the "edge" or operating the system for conditions that the system is not designed for.

Finally, with the advent of advancements in information technology, more comprehensive strategic planning tools, better educational dialog with power-system decision makers, innovations in power-system monitoring, and the deployment of advance warning systems to arrest the grid from wide-area outages, power systems can leap forward to meet 21st-century expectations in reliable power delivery. There are challenges ahead; however, there are also many opportunities and solutions for taming the power grid. Some of these are described in more detail in the material listed below for further reading.

As is often the case with complex topics, a picture, graph, or cartoon can illustrate the point more effectively. We have created a cartoonto illustrate how disasters could be avoided by not pushing limits, making sound decisions, and by taking proper preventive and corrective measures.


The authors express their profound appreciation to the distinguished industry experts and IEEE Life Fellows T.W. Cease, Stan Horowitz, Arun Phadke, Jim Thorp, and Mani Venkata for their sage review of this collective effort. Thanks also to our graphic artist, Alex Legaspi, for his attention to detail. Finally, our utmost gratitude is extended to the highly skilled team of contributors, all of whom are noted authors and well-respected members of the power system profession with many years of academic, research and development, and industry experience.

About the Authors

Vahid Madani (SM) is a principal protection engineer at Pacific Gas and ElectricCo. His responsiblities include 500 kV transmission protection and controls,reactive compensation, and series capacitor protection and controls. He alsohas several technical and leadership roles in the Western Electricity CoordinatingCouncil (WECC) and the IEEE. Madani is chair of the Remedial Action Scheme ReliabilitySubcommittee within the WECC, and chair of the IEEE working group on the globalindustry experiences with system integrity protection schemes and is a CIGREtask force member for "Design and Deployment of Defence Plan Against ExtremeContingencies." He has more than 22 years of academic and utility experienceand has authored numerous publications in power systems topics. Both he and coauthorDamir Novosel are members of the IEEE Power Engineering Society.

Damir Novosel (F) is president and general manager for transmission and distribution consulting at KEMA. Before joining KEMA, he was vice president of global product management for automation products and manager of the power systems consulting group at ABB. He has 23 years of experience working with electric utilities and vendors and has contributed to a large number of IEEE and CIGRE tutorials, guides, standards, and reports dealing with electric power systems. Novosel is the past chair of the IEEE PSRC Sub-Committee on System Protection and holds 16 U.S. and international patents.


In addition to Vahid Madani and Damir Novosel, other members of the IEEE Power Engineering Society who contributed to this article were Göran Andersson, Eidgenössische Technische Hochschule University, Zurich; Daniel Karlsson, ABB, Sweden, Bogdan Kasztenny, General Electric Co., Canada, and from the United States, Paul Myrda, Trans-Elect Inc.; Mayer Sasson, Consolidated Edison Co.; Carlson Taylor, Bonneville Power Adminstration; Chifong Thomas, Pacific Gas & Electric Co.; and Vijay Vittal, Iowa State University. The cartoon panel "Lost Joe" is by Alex Legaspi.

To Probe Further

V. Madani, D. Novosel, A. Apostolov, S. Corsi, "Innovative Solutions for Preventing Wide-Area Disturbance Propagation," International Institute for Research and Education in Power Systems (IREP) Symposium, Cortina d'Ampezzo, Italy, August 2004.

NERC Recommendations to August 14, 2003, "Blackout--Prevent and Mitigate the Impacts of Future Cascading Blackouts"; https://www.NERC.com.

Western Systems Coordinating Council Disturbance Summary Reports for Power System Outages Occurred in December 1994, July 1996, and August 1996, respectively; https://www.WECC.biz

P. Fairley, "The Unruly Power Grid," IEEE Spectrum, pp. 22�27, August 2004.

A. Hellemans and M. Mukerjee, "Computing the Cosmos," IEEE Spectrum, pp. 28�34, August 2004.

D. Novosel, M. Begovic, V. Madani, "Shedding Light on Blackouts," Power and Energy Magazine , January/February 2004.

S.H. Horowitz and A.G. Phadke, "Boosting Immunity to Blackouts," Power and Energy Magazine , September/October 2003.

C.W. Taylor, "Improving Grid Behavior," IEEE Spectrum, pp. 40�45, June 1999.

This article is for IEEE members only. Join IEEE to access our full archive.

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, podcasts, and special reports. Learn more →

If you're already an IEEE member, please sign in to continue reading.

Membership includes:

  • Get unlimited access to IEEE Spectrum content
  • Follow your favorite topics to create a personalized feed of IEEE Spectrum content
  • Save Spectrum articles to read later
  • Network with other technology professionals
  • Establish a professional profile
  • Create a group to share and collaborate on projects
  • Discover IEEE events and activities
  • Join and participate in discussions