What was the root cause of the FAA system outage?

On January 11, 2023, the Federal Aviation Administration (FAA) experienced a major system outage that resulted in the grounding of all domestic departing flights across the United States for several hours. This was an unprecedented event that had massive ramifications for air travel and led to cancellations and delays across the country. Understanding the root cause of this outage is critical to prevent similar events from occurring in the future.

What happened during the FAA system outage?

At approximately 6:28 am EST on January 11, the FAA ordered all domestic departing flights grounded until 9 am EST. This was done after the FAA’s Notice to Air Missions System (NOTAM) failed. NOTAM provides essential safety information to flight crews before takeoff. With the system down, thousands of flights across the US were delayed or cancelled. Major airports were impacted, including Los Angeles International, Chicago O’Hare, Hartsfield-Jackson Atlanta, and Dallas Fort Worth. In total, over 11,300 flights were delayed and over 1300 were cancelled due to the outage.

The FAA was able to lift the ground stop order at 9:01 am EST after restoring the NOTAM system. However, the disruption had lasting impacts throughout the day, with over 6000 more delays and 1000 more cancellations reported even after the groundings were lifted. The FAA continued investigating the cause and working to stabilize their systems throughout the day and into the next day.

What is the NOTAM system?

NOTAM, or Notice to Air Missions, is a system that provides critical pre-flight information to pilots, flight crews, dispatchers, and others involved in aviation. It is used to alert aircraft of potential hazards and distributes essential information about airports, airspace restrictions, and changes to airport facilities or procedures. NOTAMs contain vital details that can impact safety and flight planning.

Some key information provided through NOTAM includes:

  • Runway and taxiway closures
  • Construction or maintenance work affecting airports
  • Airspace restrictions due to events like sporting events or VIP movements
  • Inoperable navigational aids or lighting systems
  • Temporary flight restrictions

Pilots are required to review NOTAMs before their flights so they have the latest safety information for their routes and destinations. The system provides the real-time situational awareness pilots need to operate safely.

How and why did the NOTAM system fail?

The FAA indicated the root cause of the NOTAM failure was due to a corrupted database file. NOTAM information is stored and distributed from redundant databases. On the morning of January 11, one of these database files became corrupted and prevented the timely delivery of updated information.

According to the FAA, there was no evidence of a cyberattack, hacking, or deliberate sabotage behind the corrupted file. Instead, it appears to have been caused by a maintenance timing issue that was compounded by a latent software bug. Essentially, a regular overnight database maintenance process and a latent software error combined to damage the data file.

Specifically, the FAA indicates the problem arose during a procedure to purge old NOTAM data and incorporate new information. This regular overnight process helps limit database size. However, a long-standing latent bug meant the databases briefly lost connectivity during the file transfer process. The transferring file became corrupted as a result.

The redundant NOTAM databases are designed to prevent any single point of failure. However, the latent software bug meant redundancy ultimately failed here. With one database remaining corrupted, updated information could not be delivered across the system. This led to nationwide safety information blackout.

Timeline of key events

Here is a timeline of how the outage unfolded on the morning of January 11, 2023:

  • Overnight: Routine database maintenance process initiated. NOTAM file transfer begins but connection briefly drops due to latent software bug, corrupting file.
  • 5:28 am EST: NOTAM system detects corrupted file and redundancy fails. NOTAM information stops updating across system.
  • 6:28 am EST: FAA orders nationwide ground stop on all domestic departing flights until 9:00 am EST due to NOTAM outage.
  • 7:19 am EST: Southwest Airlines proactively cancels all flights nationwide until midday.
  • 8:50 am EST: FAA begins testing fixes to restore NOTAM system.
  • 9:01 am EST: FAA lifts ground stop order after verifying NOTAM data is once again accessible.
  • 9:00 am – 7:00 pm EST: Cascading delays and cancellations continue impacting flights across country.
  • January 12: FAA continues efforts to determine root cause and restore confidence in overall system.

What was the impact on US air travel?

The impacts of the NOTAM outage were massive, rippling across US air travel throughout the day on January 11th. Some key facts about the disruption include:

  • Over 1,300 flights cancelled
  • More than 11,300 flights delayed
  • More than 1 million air travelers impacted
  • Operations slowed at major hubs like Atlanta, Chicago, and Los Angeles
  • Delays and cancellations continued hours after grounding lifted
  • Economic loss estimated between $400 million to $600 million

The wider implications were also significant. The halt disrupted supply chains, prevented workers from traveling, separated families mid-journey, and upended business operations. Public faith in the air traffic system was also shaken by the outage. The ripple effects from the NOTAM failure caused a chaotic day for air travel.

Flight Cancellations

Over 1300 flights were cancelled due to the outage. Cancellations include:

Airline Flights Cancelled
Southwest 366
Delta 260
United 121
American 97
JetBlue 77
Other 392

Southwest proactively cancelled hundreds of flights beyond those halted by the ground stop, in a bid to reset operations. This added to the overall cancellations across airlines.

Flight Delays

There were over 11,300 delays on January 11 due to the outage. Major airports saw the following number of departure delays:

Airport Delays
Dallas Fort Worth 1462
Los Angeles 999
Atlanta 906
Denver 754
San Francisco 728

Delays continued to grow even after the ground stop order lifted, as downstream effects disrupted traffic flow. Additional staffing was brought in by airlines to try and reset operations.

Economic Impact

Between cancellations, delays, and ongoing recovery efforts, the outage is estimated to have cost the US airline industry between $400 million to $600 million in lost revenue. There were also wider economic losses tied to impacts on passenger travel, tourism, supply chain flow, and business activities.

Response and investigations

In the aftermath of the outage, the response focused on investigating root causes, restoring confidence in the system, and preventing any repeats in the future. Key actions included:

  • FAA teams worked to restore NOTAM database integrity and functionality.
  • Agency leadership briefed the Department of Transportation, White House, and other authorities.
  • A task force was formed to conduct a root cause analysis of the failure.
  • The FAA Administrator testified before Congress on the outage.
  • Protocols and contingencies for NOTAM disruptions were reviewed.
  • Plans were developed to prevent similar technical and procedural issues in the future.

Investigations ultimately identified the maintenance timing error and latent software bug as the technical root causes. But there were also critiques of the FAA’s contingency planning and lack of a clear public communications strategy during the crisis. The agency was faulted for providing unclear messaging amidst the chaos.

Preventing future outages

To help prevent any repeats of the NOTAM failure, the FAA outlined several steps it is taking based on the root cause analysis:

  • Fixing the latent software issue that allowed the corruption to occur.
  • Enhancing the database file transfer process to be more robust.
  • Strengthening redundancy capabilities.
  • Updating standard operating procedures during database activities.
  • Establishing more comprehensive contingency plans.
  • Improving crisis communications protocols.
  • Ongoing monitoring and assessment of NOTAM system health.

Beyond the NOTAM system itself, the FAA highlighted plans for broader efforts to upgrade technologies, improve personnel training, and streamline coordination and information sharing with airlines and other aviation stakeholders.

However, challenges remain in modernizing such a complex, interconnected system across the national airspace. While progress can be made, outages and failures cannot be fully eliminated given the intricacies and scale involved. Continued vigilance and system-wide review will be required.

Key takeaways

Some key takeaways from the January 11, 2023 FAA NOTAM outage include:

  • A corrupted database file prevented NOTAM information distribution, grounding US flights.
  • Maintenance timing issue and latent software bug combined to damage the file.
  • Outage crippled air travel, with 1300+ cancellations and 11,000+ delays.
  • Economic impact estimated between $400-$600 million.
  • FAA focused on restoring system and investigating root causes.
  • Steps underway to fix technical issues and improve contingency planning.
  • Challenges remain in modernizing intricate national airspace systems.
  • Continued review needed across technologies, training, and coordination.

Conclusion

The NOTAM system failure on January 11, 2023 was an unprecedented event for the US air travel network. A corrupted database file brought the distribution of essential safety information to a halt, prompting a nationwide ground stop. The resulting delays and cancellations created a disruptive ripple effect across aviation. Root causes pointed to a maintenance procedure error and software flaw combining to damage a database. The FAA now faces challenges in both the technical and organizational realms to improve the NOTAM system, modernize legacy infrastructure, enhance training, and implement more robust contingency planning. While progress can and will be made, the intricate nature of the national airspace means risks cannot be fully eliminated. This major system outage has underscored the need for ongoing vigilance, coordination, and review to uphold the resilience and safety of American air travel.