Was the FAA outage caused by an intern?

The recent nationwide outage of the Federal Aviation Administration’s (FAA) Notice to Air Missions (NOTAM) system has raised many questions about what caused the disruption. Some have speculated that the issue may have been caused by an intern accidentally deleting files or making changes to the system. While the true cause is still under investigation, blaming an intern is an overly simplistic explanation for a complex technical failure.

What do we know about the FAA outage?

On January 11, 2023, the FAA experienced an outage of its NOTAM system that provides safety information to flight crews. This led to a ground stop of all domestic departures for over 90 minutes until the system came back online. Over 1,200 flights were delayed and nearly 100 were cancelled due to the disruption.

The NOTAM system consists of legacy software and hardware that is over 30 years old. While upgrades were in progress, a damaged database file and backup failure caused the nationwide crash. So far, there is no evidence that intentional malicious action caused the outage.

Could an intern really cause a major outage?

It’s unlikely a single intern could accidentally take down the entire NOTAM system. The FAA has said there are multiple layers of security to access the servers and systems that run NOTAM. It’s not as simple as an intern stumbling onto a delete key.

Major outages are usually the result of multiple failures happening at the same time. The FAA was already operating on aging infrastructure while trying to upgrade. This increased the fragility of the system. While human error may have played a role, there are likely complex technological factors also at fault.

Why blaming an intern oversimplifies things

Saying “an intern did it” makes for a catchy headline and taps into stereotypes about inexperienced young people causing problems with technology. However, major system outages are almost always due to the interplay of people, processes and technology – not just one person making a mistake.

Scapegoating one intern distracts from the real issues like inadequate funding, failure to upgrade legacy systems, lack of redundancy, and insufficient training and oversight. Problems arise not from single events but from organizational flaws at multiple levels.

The risks of outdated infrastructure

The FAA outage highlights the vulnerabilities of relying on outdated systems that have exceeded their expected lifespans. Much of our national critical infrastructure relies on legacy mainframe systems that are decades old.

Failing to upgrade leads to fragility. Newer systems have failsafes that can prevent or reduce downtime from technical glitches or human error. Modernization needs adequate resources and commitment to avoid preventable outages.

Steps to prevent future outages

Rather than seeking a scapegoat, the FAA needs a thorough investigation of the root causes. They should:

  • Audit existing systems and infrastructure
  • Assess risks, failures points and redundancies
  • Identify priorities for upgrades and modernization
  • Review policies, procedures and training
  • Develop failover systems to reduce downtime
  • Ensure adequate maintenance staffing and resources

Upgrading complex legacy systems takes time and resources. But preventing future disruptions that put lives at risk needs to be a national priority.

Conclusion

Blaming the FAA outage on a hypothetical intern is an overly simplistic hot take. Major system failures have complex causes involving technology, processes and human factors. Upgrading outdated infrastructure and building in redundancy needs to be a priority to prevent disruptions to critical systems like air traffic control from technical glitches or human error.