Q & A with Robert Cunningham
Lincoln Laboratory’s Robert Cunningham leads the Institute for Information Infrastructure Protection (I3P), a consortium dedicated to the security of process control systems in an age of increasing vulnerability.
The process control computers that are in charge of such functions as distribution of electricity along the nation’s power grids, the flow of natural gas through pipelines, and the operation of water treatment plants are becoming more and more accessible through the Internet—and thus potentially vulnerable to terrorist attack. Robert Cunningham, associate leader of Lincoln Laboratory’s Information Systems Technology Group, leads a project for a consortium of universities, national laboratories, and federally funded Research and Development centers called the Institute for Information Infrastructure Protection (I3P). The aim of I3P is to make sure such process control systems are secure. Lincoln Laboratory Journal contributing writer Neil Savage spoke with Cunningham about the consortium and the problems it aims to solve.
Lincoln Laboratory Journal: When you began to look into the state of this infrastructure, what surprised you most?
Robert Cunningham: My biggest surprise was when I got to see a process control plant. What I saw was a 30-year-old DEC machine—Digital Equipment Corporation, which no longer exists—as the primary process control system for this particular plant. And it was operating side by side with a Windows 2000 system that’s also many years old and that has lots of well-known vulnerabilities. Then there’s a modern laptop sitting right next to the other two systems. So it’s like 30 years of the history of computing equipment, all nearby, some just recently connected to the Internet. All these things sitting in one room, sometimes connected together, are pretty worrisome.
LLJ: Why are process control systems so vulnerable?
Cunningham: Most of these systems are RTUs and PLCs. An RTU is a remote terminal unit and a PLC is a programmable logic controller. They were originally designed to talk on a dedicated wire that connected them to a human–machine interface—so there was no sort of authentication ever built into the protocols they use. There is nothing that says, “Are you really my PLC?” And there’s nothing in the protocol to ensure that the data can’t be changed because it was assumed that faults would occur very rarely. Finally, the network software stack was tested in very limited ways—making assumptions about what would talk to it and how.
The second thing to know is that the commodity systems that these are being built on stay in service for exceptionally long times and, unlike an enterprise computer system—like, say, the Macintosh that I have back in the office—they aren’t patched at the same rate. It’s easy to see why. After all, these things are the process control for the plant. Taking them down and upgrading them is a big deal, resulting in lost production and revenue. So if they are not broken, they are not usually fixed. After a decade of unpatched service, an enterprise operating system typically will have many well-known vulnerabilities. These vulnerabilities have particular importance in process control systems, which are where computers systems touch the physical world. If the systems aren’t carefully designed and operated, human beings can die.
LLJ: What sort of mischief might someone cause by hacking into these systems?
Cunningham: I have a couple of examples. The first is the Bellingham, Washington, incident where there was a 16-inch pipeline that ruptured and poured 237,000 gallons of gasoline into a creek. Two 10-year-old boys and an 18-year-old fisherman died when the gas ignited and sent a fireball down the creek. This was not an example of terrorism. It’s simply a case where a system administrator was changing the records of a database to monitor a few extra things. The admin committed the changes and went to the bathroom for 15 minutes. While he was gone, the people who were in charge of controlling the system had opened up the input to this long pipeline but were unable to control the output. More and more and more gas was flowing into the pipe with nowhere to go. Because their monitoring at the far end wasn’t working either, they couldn’t see that it hadn’t opened, and they couldn’t tell that there was a pressure problem as well. As a result, the pipeline ruptured. Here’s a case where it pretty clearly demonstrates that if you’re not careful and your process control system fails, you can end up killing people.
The other example that I like to use is from the wastewater treatment plant in Maroochy Shire, Australia. A disgruntled former employee of the company that made the facility’s process control system applied for, but was turned down for, a job at the wastewater plant. Monitoring and control for these systems was communicated wirelessly, and he knew how to access the network via the wireless network. So he would drive to a site near a receiver, connect in as pumping station #4, and reprogram the PCS to cause the control systems to dump sewage into the nearby rivers. Then he would call up the operating company and say, “You know, I would serve as a consultant for you if you need somebody to help fix your problem.” Eventually he was caught and sentenced for two years in prison.
LLJ: Could he have done this from the Internet?
Cunningham: I’m not sure. But the employee knew the PCS network better than the business network, and he may simply have exploited what was easiest for him. I do know that it’s increasingly common to have the PCS network connect to a DMZ network, which connects to a business network, which connects to the Internet. Sometimes companies don’t think this is so—a few years back Paul Dorey, then chief security officer at BP, asked if the company’s process control systems were connected to the Internet. Dorey was told that none were. He was skeptical, though, and so he did a careful study that discovered that in fact 89% of BP’s process control systems were connected to the Internet. If those system connections are carefully designed, there is at least one and maybe more firewalls. A common mistake is to think only of outside attackers. But if attackers can get to the PCS networks, then they can often reach back as far as into the business network.
LLJ: Are the system operators not aware these problems exist?
Cunningham: Not until recently. With some of the problems, the operators hadn’t really thought through the process. There was an argument until relatively recently that said, “Well, I don’t think we have a problem here,” because we didn’t have examples of real cases of where people have successfully attacked systems. So that’s why the Maroochy Shire example is a very nice one.
Even when the IT folks in a water treatment plant started to notice that a problem existed, little changed at first. In some cases, they were responsible for the business network but not the process control network, and in other cases their suggestions to improve the security of the system were falling on deaf ears.
Furthermore, the IT personnel at the plants didn’t have the tools to make the business case for better security. In the market space, most operators were not asking for security features to be built into products. That needs to happen—there needs to be a market pull. You also need to have a market push—vendors should be making equipment available that is more secure at about the same price. The program that I’ve been working on in collaboration with lots of other folks from other labs and other universities has been trying to build both pieces of this, working with operators to build the pull for buying new equipment, and working with the vendors to improve the quality of the offerings and the security of their products.
LLJ: What is Lincoln Laboratory doing about the problem?
Cunningham: We helped fashion the research program, and we have one element of the solution. Our piece in all of this is trying to secure software that vendors are making as a part of the PLCs and the RTUs. We’re building a tool that will allow vendors to automatically test for certain vulnerabilities. Input is fed into the system, the system runs, and our instrumentation keeps track, for example, of how much memory is allocated or free. It can also tell you if you write beyond a range of memory dedicated for that information. If memory use starts to grow without bound or is improperly written to, we can tell you exactly where—at what line of code—the potential vulnerability is occurring. Then vendors can go back and fix their software to make sure that the vulnerability doesn’t continue. I hope that this tool will become useful elsewhere at Lincoln Laboratory, too. The Laboratory has lots of long-lived embedded systems, like satellites and radar systems, that could benefit from a tool like this.
LLJ: What other steps is the consortium taking?
Cunningham: You have to think about this from the point of view of an operator because the marketplace is ultimately going to have to buy this capability. An extremely high percentage of the nation’s critical infrastructure is owned by independent systems operators. So we have to talk to them and help them build a business case for why it’s important to include security in their systems. Then we tell them some of the questions that they need to be able to ask their vendors.
LLJ: What sort of questions?
Cunningham: Are they using secure protocols? Have they had their software systems tested? Has the platform they’re working on been hardened against sets of attacks? The first couple of questions about rigorous testing are being answered by the work we do at Lincoln Laboratory. The folks at the Pacific Northwest National Laboratory are working on platform hardening. Then, operators need to configure their systems so that only certain people, coming from certain locations, are able to access and control the components and processes running there. Researchers at the University of Illinois at Urbana-Champaign are developing a tool to make sure that firewalls are being configured correctly.
Once operators have everything configured in a way that’s secure, they still need to monitor use, because they usually don’t want to prevent all access, and not all the attackers come from the outside. So we’ve got a mechanism to monitor use which is being worked on at the University of Tulsa. And then because we think even that may someday fail, we need the system to be able to be recovered as quickly as possible. Our team members at Sandia National Laboratories are trying to build a tool to automatically recover and restore a system that’s been broken.
The last thing we are trying to do is help out with technology transfer. Our first tool is being made commercially available this year. This is MITRE’s RiskMAP tool, which connects business objectives to network nodes and indicates where investment should be made.
LLJ: What else do you have to do to get the message out about securing the nation’s infrastructure?
Cunningham: The program has been going on for two and a half years now, and it’s got about another year and a half left. We usually say that there are three sets of customers. One is the government, which has asked us to help participate on running a number of workshops for them. We’ve done that. In fact, we just had a very successful workshop in Houston, where the attendees raved about our approach. And I’ve participated in webcasts for the SANS (SysAdmin, Audit, Network, Security) Institute. Another customer is academia. We’ve held a couple of conferences and published one book covering our and others’ research, and we’ll be working on a second book this coming year. I’ve given several invited talks to various universities. Finally, there is industry. We’ve got a couple of patents pending, and we hope to file for additional ones within the next year. We’d like to have at least one more commercial system on the market in addition to the MITRE tool. We’ve also put together an advisory board made up of vendors and operators in the oil, gas, and chemical businesses. Some companies have asked us to come in and help work with them, which has two benefits—it’s an opportunity for them to make their systems more secure and for us to make sure that the systems that we’re building actually work. In fact, we get more requests than we can handle.