On episode 15 of CyberSound, Jason and Paul sit down with Fred Cass, Associate CIO of Trinity College, to discuss what leadership looks like during incident response.
Leadership in Incident Response
Listen to this episode on
[00:00:01.210] – Speaker
This is CyberSound, your simplified and fundamentals-focused source for all things cybersecurity, with your hosts, Jason Pufahl and Steve Maresca.
[00:00:12.250] – Jason Pufahl
Welcome to CyberSound. I’m your host, Jason Pufahl, showing as always by Steve Maresca. And today, we’ve got special guest, the Associate CIO of Trinity College, Fred Kass.
[00:00:21.190] – Fred Kass
[00:00:21.850] – Steve Maresca
[00:00:23.290] – Jason Pufahl
Alright. So again, I think that we have demonstrated already in the first half-hour just chatting around this podcast that this is a topic that we’re all interested in here. We’re going to talk about incident response, not the traditional; hey, what led to an incident? And let’s talk about your technical things.
[00:00:41.530] – Jason Pufahl
Really. I think it’s much more of the human element of incident response, discussing how incidents impact an institution, how it impacts people, how you effectively manage emotions and workloads, and everything else as you work through an unexpected event like this. For a little context for everybody, Trinity College had an incident six months ago. Is it around there?
[00:01:03.550] – Fred Kass
Over a year.
[00:01:04.750] – Jason Pufahl
Over a year? That’s just how time moves. So an incident over a year ago that Trinity College in Vancord worked together. And so I think we can bring some pretty unique perspectives just around how that felt, how communications influenced an incident, how keeping people as fresh as you can in spite of maybe 18-hour workdays is important.
[00:01:28.810] – Jason Pufahl
Fred, we were chatting a little bit before, and you said spending time on the idea of business continuity and managing business operations and incidents. So maybe that’s a great spot to segue into.
[00:01:40.330] – Fred Kass
Yeah. I guess I’ll say, Jason, we’ve known each other for a while, and I think one of the times we spent a lot of time together in the past was around leadership in an organization that we’re both in. I’m a big proponent of thinking about leadership and where that fits into these conversations. So I guess I’ll say in my case, I think part of business continuity is actually the leadership and the people side of it.
[00:02:10.510] – Fred Kass
So you got to think about the servers and their uptime and all that kind of stuff. But in my mind, part of the planning process really has to do with the team. Does the team work well together? Do you understand your roles? Do you understand how you’re most effective as a high-performing team? And that’s important just in general. But in an incident response, it’s really heightened, like it’s a different level of stress. So it’s about leadership in a crisis or leadership and stress and how you manage those processes.
[00:02:49.570] – Steve Maresca
Roles change. It’s in bottom line. And people who are really the leaders of a conversation in a relaxed environment certainly may not be when under the gun. And that is a big challenge in these events. I care a lot about the human element in these, and managing that and keeping everyone calm, collected, and making decisions based on good data is effectively the primary challenge in my view.
[00:03:16.750] – Jason Pufahl
So how did it start? And by that, I mean, you had an event. You had staff that probably came into work feeling like it was a normal day. I always say you have to manage almost the excitement that surrounds these initially because there’s a crisis. And then everybody feels like they need to get involved, and they need to contribute because the institution they work for, that they care about, is under attack. How do you slow that down a little bit and make sure you’re making good decisions right out of the gate?
[00:03:47.950] – Fred Kass
That’s a good question. It’s funny. I feel like you keep asking me questions, and I answer different questions.
[00:03:55.570] – Jason Pufahl
But that’s normal [crosstalk 00:03:56]
[00:03:56.530] – Fred Kass
What I think I like about our relationship. We have a very different approach on the scene or different angle on the same thought. Incident response, in some ways, is dealing with… Everyone in the IT world has dealt with when the system is down. And so some of that crisis management or response is normally built into IT people, so to speak, like, “Oh, crap. Something broke. We got to go fix it.”
[00:04:26.990] – Fred Kass
And so I think that that response that you’re talking about is normal. I think we’re used to in the olden days when you ran your own email. But when the email server went down or when the network melted down, it always became all hands on deck event. I think understanding how you come together as a team and how you assign roles in that process is an important part of it. I don’t know. Did I answer your question, or I went on my own?
[00:04:58.190] – Jason Pufahl
So it’s funny. No. So I almost think like, well, then, are you in effect doing tabletops every time something breaks. And I never really thought about it that way.
[00:05:07.250] – Steve Maresca
My view on that is that that’s true. However, the tenor of the moment is very different in an incident. There’s uncertainty. You don’t know where the origin is. The scale could be tiny, or it could be enormous. You don’t know at the outset. And wrapping around that is a far bigger challenge than something reported by a user. Clearly, the email server’s down. Okay. Let’s go figure that out. We know how to solve that problem. People work best when problems are divisible and put into small boxes. Incidents are inherently the opposite.
[00:05:47.810] – Fred Kass
Yeah. I think you’re right to an extent. But understanding whether the email is down because the server room is underwater or because there was a small bug in a software code, so that part of it, I think to me, has a similar cadence to it. But the thing that really differentiates it is, once you understand what’s going on with the machine room in the email, example, you can then react to it. What I think is different about a cybersecurity incident is there is an active adversary on the other side who is reacting to you.
[00:06:28.360] – Jason Pufahl
[crosstalk 00:06:28] of the moment is very, very different.
[00:06:30.650] – Fred Kass
Yeah. Answering your question from a moment ago, that first hour, no, maybe half-hour, feels very similar to any sort of business crisis-
[00:06:43.010] – Jason Pufahl
And evaluation phase. Sure.
[00:06:45.050] – Fred Kass
But it’s the next phase of it because you can’t wrap your hands around it and go, all right, I understand this problem, and I understand how to solve it. Now, I develop a plan. I mean, that’s true. But the plans have to be more adaptable and more changing. And the environment becomes bigger. I think the discovery process is different, too. I feel like it’s very easy to narrow down the discovery of a technical problem, but narrowing down the discovery of an event every couple of hours, you could learn a new piece of information.
[00:07:17.390] – Fred Kass
And that’s not going to happen.
[00:07:18.650] – Jason Pufahl
In a potentially active adversary.
[00:07:20.450] – Steve Maresca
Who are clearing logs, trying to hide, deliberately being evasive? It’s a different environment, but the thought process is similar, I would agree.
[00:07:29.750] – Fred Kass
Except for that, meeting to be more adaptable. That’s, I think, the key differentiating factor.
[00:07:38.150] – Steve Maresca
I think we were talking earlier about making decisions when you have poor information. That is absolutely the case in an incident.
[00:07:47.090] – Jason Pufahl
And conveying the rationale behind the decisions. I think specifically to the event that we all work together. You had the incident. You took some containment steps that arguably reduced functionality at the institution. And you had to be able to justify that. So then you are managing communication upstream around. Here is a significant change that we’re going to make to hopefully reduce the negative impact on the incident.
[00:08:14.990] – Jason Pufahl
But people won’t be able to get on the network. People won’t be able to get out of the institution, describe the communications, say up to leadership, even down to staff to demonstrate like, we’re making the best decision we can with the information we have now. Right?
[00:08:30.050] – Fred Kass
Yeah. And I think some of that, going back to my old analogy, is the same. I’m a firm believer and that you should have a human firewall between your technical staff and your external communication. I’m a firm believer that you have to communicate externally regularly, both to the public, to your user base, so to speak, and to your senior management.
[00:08:54.650] – Fred Kass
But I think what’s different in a cybersecurity incident is you’ve got to involve lawyers, and you’ve got to involve a lot more thought in that. There’s something different about the communication flow that you’ve got to make sure you’re meeting the regulatory goals as well as the operational goals and the community building goals.
[00:09:14.390] – Jason Pufahl
It’s not just, hey, give me an update in the event. You’re obligated to tell some agency something, perhaps, depending what the type of data. So yeah, they’re very different communications in that case.
[00:09:26.150] – Steve Maresca
One example that we’ve encountered regularly is that people will casually throw around… We’ve had a breach. Well, okay. You’re using a word that you know is associated with incidents, but that takes on a very specific meaning, especially in the legal context for notification purposes. We have to be very careful when doing our best to communicate broadly because the language matters in ways that they don’t typically.
[00:09:53.390] – Fred Kass
That’s true. And it’s also an odd for those of us who have been in the field a long time. I remember when people would get into pedantic arguments about hackers, crackers, and all these kinds of things in there. They’re different, and you have to know. And then the communications people would go and be like, no one cares. What they want to do is communicate out a message that humans understand.
[00:10:14.450] – Fred Kass
And so, in this case, you’re absolutely right because you still have that communications person in the room being like, nobody understands what you’re saying. And then you have the lawyer in the room saying, “You really want to use these words.” And so that’s an odd, difficult balance.
[00:10:29.450] – Steve Maresca
Right. And you’re trying to strike a secondary balance along the lines of saying, at the moment, we don’t know one way or the other whether data has been accessed, but we’re doing our best to learn more about the situation. We’ll let you know when we have more info. We’re trying to throw a very fine line to communicate accurately without making people more anxious than they already are, but at least letting them know that you’re on top of something.
[00:10:56.990] – Fred Kass
Absolutely. And it is a difficult balance to thread, but an important one. And that’s why you have to have that human firewall between the technical people and the ones that are working on that communication problem.
[00:11:12.350] – Jason Pufahl
I think what you did in your event well was balancing communications with restoration because we’ve definitely been in scenarios where people opted not to communicate at all. And the excuse might be, well, we need every resource focused on that recovery or restoring operations. The reality is you’re probably going to be out for a protracted amount of time. It’s critical that people understand to the degree that you can tell them why it’s happening, what the potential impact is. Set some expectations, maybe being vague at the same time because you don’t know what the outcomes are going to be.
[00:11:52.010] – Jason Pufahl
But every event, you’re always balancing that recovery quality with the other activities that happen in there. So you’re forced down into a position sometimes where you’re actively telling folks, “All right. You’ve been out of for 18 hours. You need to take a break.” While potentially, people are saying, “Well, how fast can you get this up?” And those are at odds, right?
[00:12:16.310] – Fred Kass
Yeah. I’m waiting for Steve to come in because the third odd in there is making sure that your environment is secure, and you’re following a diligent process because I’ve heard that from Steve a few times, and it is an important one.
[00:12:28.670] – Steve Maresca
It was a repeating theme. But honestly, I wasn’t going to go there. I was actually thinking more about the anxiety of people beyond IT because you have staff with no technical role; really, you have broader constituency in general, partner organizations. They perceive there’s an issue. It’s relatively easy to see both internally as staff and faculty, students; you name it.
[00:12:54.350] – Steve Maresca
But also beyond, if email doesn’t work and emails bounce, it’s pretty obvious something’s going on. So electing in some cases, in incidents we’ve seen not to communicate really is a damaging act as well, because it lets bigger, darker cloud gather when arguably at least saying something.
[00:13:15.710] – Jason Pufahl
Transparency is so important. We advocate that. Everything that we do in incident responses they’re different. There are events that people want to sweep onto the rug a little bit and keep quiet. But the reality is, the more people know, I think typically, the better received they’re going to be by the community and by your business partners and by anybody else you have to tell. It’s definitely worthwhile.
[00:13:38.210] – Steve Maresca
Especially in 2021, most organizations have experienced some security incident of some kind. It no longer has the black mark that it may have 10 or 12 years ago, and therefore PR is a bit easier to manage, in my opinion.
[00:13:54.590] – Fred Kass
Yeah, it’s true. It’s unfortunate but true. I do think the other part of that that you were talking about is interesting to me is the demand for restoring operations versus the ability to do that and how you set this. In my mind, your most important thing as an organization is your data and protecting your data, and uptime comes second to that.
[00:14:23.270] – Fred Kass
But that’s really important, too. And so balancing both those needs with really the staffing, that’s a very tricky balance to make sure you’re setting that cadence in an emergency so that you can solve the problems in a timely way.
[00:14:43.490] – Fred Kass
And honestly, it’s a little bit of a science and a little bit of an art. In some cases, you act as a field general and try to hope you’re making the right choices, and you have the right people in the room to give you advice and react to that.
[00:15:00.710] – Jason Pufahl
I like the field general comment. You made decisions in the middle of the incident to bring in other resources. That potentially comes at a cost because now you have to onboard other people, and you have to make sure that everybody’s comfortable with the fact that you’re bringing somebody in and make sure that it’s effective. There’s a lot of complicated decision-making that happens. And again, you have to communicate to people why you’re doing it, and you’re making sure it actually makes sense.
[00:15:26.030] – Jason Pufahl
They’re not easy to manage. It’s not just about resting people. It’s not just about having those conversations. But you want people to be productive, which means, in some cases, conveying information that people don’t want to hear and yet keeping them innovative or motivated to keep working, which is hard to do.
[00:15:44.150] – Steve Maresca
I think of a related subject because we’re dancing around the notion where people are really keen to have services restored, and incident response is largely about, at least at the beginning, pumping the brakes and avoiding that conversation because you’re not ready to have it.
[00:16:03.590] – Steve Maresca
I think immediately about business continuity and disaster recovery planning. And frankly, those plans it’s just like a battle plan to extend your analogy. They don’t survive contact with the enemy every time you learn something new about an oversight or maybe over-optimistic beliefs about ability to restore. And managing that problem in and of itself, especially while trying to defend infrastructure, is a very challenging one.
[00:16:30.830] – Steve Maresca
I’m interested to hear some of your thoughts on that front because I think that every organization dealing with an incident has to deal with that specific problem.
[00:16:41.330] – Fred Kass
Just because no plan is going to be the perfect plan doesn’t mean you shouldn’t have a plan, right? I think that the best plans have an adaptability to them. But even so, no matter what you plan for, how many times they plan, but you’re going to have a problem. You’re going to have to react to that.
[00:16:59.510] – Fred Kass
I’m going back to leadership. I think part of that is knowing the strengths of your team and knowing the weaknesses, and knowing how. And honestly, I’m going back to Jason’s point; you can make a difficult call. But if you have a trusting understanding with the people that you work with, they know it’s not personal, and they know it’s not coming from a negative place. It’s coming from a need to make… I guess, one of the other I’m going to go all of them…
[00:17:29.630] – Fred Kass
There’s a decision paralysis that sometimes happens. And that’s a difficult balance between not making a decision or making the right decision and when to do those. And again, you’re not always going to be perfect. So I think you’ve got to… I’m a 9-10 person. You got to be close enough to thinking that this is the right way to go. But knowing that if I tried to get to 100 percent, it’s going to take too long. And I think that battlefield operations it’s very different than the way we run our day-to-day businesses where you usually want to be very sure of something before you do it.
[00:18:06.290] – Jason Pufahl
Right. Absolutely. That often translates into compartmentalizing services in ways that don’t make sense for normal operations but allows you to bring them up in a more measured way, in a safe way that in an incident, you’ll scale back from afterward. But it’s part of that calculus. There’s a very different approach to the normal day-to-day operations. Making those decisions are often… You’re confronted with data that you may not have conclusive evidence about, in general. At best, we’re trusting tools. We’re trusting the defenses that have been provided and put into place, but you may never have better than 60 percent confidence that something is the right path to pursue.
[00:18:57.230] – Fred Kass
Yeah. It’s funny. Again, I think we have a slightly different perspective on the same thing. I’m trusting people, and that people are trusting tools, right?
[00:19:06.470] – Jason Pufahl
Right. [crosstalk 00:19:07]
[00:19:06.830] – Fred Kass
Because it’s a different perspective.
[00:19:11.370] – Jason Pufahl
So I think we’ve got to be at around our limit, I’d say, at this point. If there’s anything I think we’ve conveyed today, its incident response is hugely more complicated than, do you know your systems technically, and can you bring them back? And I would argue, frankly, managing that initial response. Managing the people through a pretty protracted stressful event is really important.
[00:19:39.270] – Jason Pufahl
Developing to your point, Fred, trust ahead of time. Trust within the community, not just say stuff that reports to people in IT. But people need to trust that there’s an event going on. People are doing their best to restore things in the most expeditious time that they can. But that might take days, and trusting the information they’re getting through the process. I think a lot of it does come through business continuity planning, maybe incident response planning, having some communications upfront.
[00:20:10.350] – Jason Pufahl
Relationships are key, clearly. I really would argue these are less… They’re technical exercises, but they’re less technicalized technical exercises than they are management exercises and managing through a really difficult event. So I don’t know if you guys have any last things that you want to say. Would that siege ending?
[00:20:33.150] – Steve Maresca
I’d say that all incidents have a bit of a long tail attached to them after business returns to normal. And that follow-through is the most important concept to keep in mind because if there aren’t those final conclusions, the lessons learned, the outward communications to resolve anxieties alike and provide notifications to affected parties, the incident hasn’t been successfully concluded. That may take a great deal of time after the immediate event has subsided, but in my opinion, it’s just as important as anything else. And I feel that you experience the same long tail.
[00:21:20.370] – Steve Maresca
Ultimately, I’m interested to know whether it felt concluded in a timely fashion for you or if it was really protracted and frustrating.
[00:21:34.690] – Fred Kass
That’s a difficult question. I think it’s true. And I’m going to go back to… I think you’re absolutely right. I think there’s the initial part. There’s the full activity in it part. And then there’s the long tail at the end. And I think they all have their points of importance. Actually, I’m sorry. I want to cover Jason’s; there’s the pre-planning point. So I think [crosstalk 00:21:57]
[00:21:58.510] – Jason Pufahl
I tried end a minute ago. So we’ll get there.
[00:22:01.390] – Fred Kass
Yes, we did. That’s a trouble in conversations, I guess.
[00:22:06.130] – Steve Maresca
[00:22:08.050] – Fred Kass
I think the long tail is hard, too, right? All parts of those have challenges. And then, honestly, where the long tail ends and where regular, good cybersecurity practice begins is a tricky balance. They smelled into each other in a way that you’re never really sure where you’ve reached that point. But I think that’s part of the maturity of the process. You have to continuously have done those steps in order to be prepared. In some ways, the long tail never ends because it’s just part of the continuous cycle of improvement in cybersecurity.
[00:22:47.830] – Steve Maresca
[00:22:50.530] – Jason Pufahl
So I am going to end, but I’m going to end it this way. It’s clear that we all have issues in incident response. I think managing them effectively, and just generally, the space has a lot to carve out in terms of how to make them effective.
[00:23:06.790] – Jason Pufahl
As always, I’d say, if anybody wants to talk more about it, feel free to reach out to us on LinkedIn at Vancord or Twitter at VancordSecurity. I’m pretty confident we can get Fred back. Candidly, I was really looking forward to having you here because I knew it would be a good and easy conversation—just an enjoyable one. And I think we all did that. So I’m really happy about that. But I think there’s a lot more here. If people are interested in it, we could do a part two.
[00:23:33.850] – Fred Kass
[00:23:35.410] – Jason Pufahl
We could. Yeah, we’ll add content. So with that, I appreciate everybody listening as always. I hope you get some value out of this. If you do want to talk more about it, let us know, and we’ll be happy to do a video. Thanks, Fred.
[00:23:47.050] – Fred Kass
[00:23:47.890] – Steve Maresca
[00:23:48.490] – Jason Pufahl
[00:23:54.110] – Speaker
Stay digital. Stay resilient. This has been CyberSound.