Episode
56

Finding Privacy in Public Data

What kind of data is attainable to the general public? There is information that you may deem private, like salaries, voter affiliation, and property assessor records. Fortunately, there are options for recourse and limiting the visibility of this data.

Join us on today’s episode of CyberSound as Jason, Steve, and Matt hope to help you and your organization understand where data in the public domain is stored and encourage finding your comfort level.

CyberSound ep56

Episode Transcript

00:01
This is CyberSound. Your simplified and fundamentals-focused source for all things cybersecurity, with your hosts, Jason Pufahl and Steven Maresca.

Jason Pufahl 00:10
Welcome to CyberSound. I’m your host, Jason Pufahl, joining me, as always, Steven Maresca and Matt Fusaro. Hey, guys. So I think we’re going to take the discussion in a slightly different direction, at least as it relates to sort of data privacy and data usage that maybe we often do, right? I think we have had conversations around, you know, social media and the way the data is used there, and maybe some of your really common shopping sites, but I think we’ve been having a conversation around sort of personal data that’s available in the public domain. And, you know, I think, as we chatted about it, we all had sort of different ideas of what types of data might be available, Steve probably spent the most time sort of utilizing public data for a variety of reasons. But you know, I’d say treat this a little bit as an informational session around, you know, what data simply does the government collect in some cases and make public and, you know, sort of what types of personal data is available that isn’t being mined necessarily by these sort of larger companies? Steve, I know you had an interesting stat around sort of data disclosures or data breaches that I thought was kind of telling in this.

Steven Maresca 01:25
Sure. So, it is a little older, but in 2019, for a country of 320/325 million people like the United States, there were 6 billion some odd sensitive records that were swept up in some sort of disclosure or breach. And it’s an interesting stat, because they are, they’re the data that is regulated, that has an obligatory reporting component.

Expand Transcript
Jason Pufahl 01:51
So let’s be specific about that, social security numbers,Steven Maresca 01:54
Yeah, social security numbers, banking information, financial records, health data, stuff that has a law attached to it. You know, in terms of privacy, the United States is not exactly the most fervent of countries in terms of protecting individual pieces of data states.Jason Pufahl 02:14
Yeah, no federal law.

Steven Maresca 02:15
Moreso, yeah. But that’s what that’s that has to do with, we’re talking about an entirely separate class of data. And for sake of conversation, let’s just focus on either data that is available merely as a side effect of an individual dealing with a municipality, a state government, what have you, or data that was, perhaps given away freely, as a result of engaging with the service or something like that. It’s information that might not be top of mind, you might think, oh, that’s private, but it’s sitting out for the open, in the open for anyone to take.

Jason Pufahl 02:52
So let’s talk specifics about it, at least some of the types of data that’s out there. You know, the one thing that always, at least you and I, Steve, worked in the public sector for a while, right. So one of the big things was salary data, the state certainly, and, you know, and that data is used for a whole variety of reasons. You know, probably the big one is, you know, what is public salary versus private salary look like? It’s just dirt in aggregate. And certainly, it was, there was just a sensitivity around well, salary data from a state level should be public, and so therefore, it’s made that way. And I think a lot of people don’t understand that their salaries are of that visible, in the state.

Steven Maresca 03:31
Right, but you know, as a citizen of a state, you’re a taxpayer, it makes sense in abstract that you, as a taxpayer are able to see where your money is being spent. And so in that particular case, yes, salaries were out in the open, you could find anyone. There were lots of news stories written on them, you know, the perhaps perceived to be excessive salaries of some individuals in places that were notable. You know, that’s one area, but others, government specific things, the courts, you know, adjudicated outcomes of some capacity, divorces, you know, they’re sensitive.

Matt Fusaro 04:11
Even as simple as police blotters. Yeah, you’re arrested, they’ll probably end up in some record somewhere that you’ll be able to look up.

Steven Maresca 04:20
Exactly. And if you’ve ever wondered why Florida is usually in the news for some sort of crazy thing that happened with a criminal or person who’s unhinged, it’s because Florida has a unique, uniquely accessible police blotter for its entire populace. It’s not the case everywhere. But you know, kind of going back to the state regulation of data in this particular context. It’s not the same everywhere. That’s why Florida stands out.

Jason Pufahl 04:46
Is that why Florida stands out? There may be a couple other factors there.

Steven Maresca 04:54
Well, in this case, it’s the Florida man meme of sorts.

Matt Fusaro 04:57
It’s the humidity.

Steven Maresca 04:58
Yeah, we’ll go with that. So other examples, many states, not all, require that voter affiliation be disclosed publicly. There are a lot of reasons for that that are legitimate, right? Primary voting, for example, to help campaigns with finding their particular, likely voters, that information, if not publicly, actually published, is accessible to entities that operate in the political sphere. Naturally, that means it bleeds outward to other places like marketers, and aggregators, and so on. We’re in an interestingly charged political climate right now, there are some people who may not want their affiliations disclosed. A good example is the state of Connecticut, you can fetch, for free, voter affiliation data. Unsurprisingly, it’s in a file format more affiliated with the mainframe, you need to understand how to parse that sort of thing to make use of it. But, it’s there. And there are websites that at least for this state, and several others like it, make that available. Now, people are generally touchy on that. Many of these sites have been delisted from Google because of complaints. But they’re still there, if you know where to look. And the trouble is that if you request that one of those aggregator sites remove your voter affiliation, it doesn’t change the fact that the data is still accessible from the state. It’s maybe less visible, right. But it’s still out there. And that disturbs some people.

Jason Pufahl 06:42
Yeah, and those aggregators, I think one of the challenges is, if you get it from the the state or the government, right, that’s one thing, you sort of understand that the data is there. It’s when the aggregators start to collect data from a variety of sources. And then you kind of get you can get redirected from what feels like a government site to a private website that is collecting data from a variety of sources as really, frankly, trying to get you to validate a whole variety of data elements and get sort of better clarity of the data that they’re actually combining.

Steven Maresca 07:13
So a good example that we talked about prior to this segment was, you know, the white pages and the yellow pages, venerable institutions serving the public good, hypothetically anyway. People don’t pick up their phones today, if they don’t know the number. There’s good reason for that. And if you go to those websites, which were legitimately useful 15/20 years ago, now, they’re more likely to push you toward, you know, background checks and for pay, you know, yeah, it’s the same sort of problem, ultimately.

Jason Pufahl 07:13
I mean, that whole, the whole business model has changed. So those white pages, yellow pages are good examples of having to adjust your business to meet the needs of today. Who picks up a phone book to look up a landline phone number that’s been in existence for 20 years, like those days are gone. So they had to morph into something. And unfortunately, sort of morphed into something that feels a little underhanded, right?

Steven Maresca 08:08
So you know, other good examples of often public data. And I’ll give two examples that I consider linked, property assessor data, tax data, how Zillow gets the estimated tax, the tax of property. There’s floorplans, there are photos, sometimes interior.

Matt Fusaro 08:32
Yeah, I mean, most of those sites, Realtor.com, Zillow, if you want to get those, you have to actually go to those sites and request that data be removed. And I did that with my house after we bought ours, all that, I think there’s actually still one out there that I have to take care of you.

Steven Maresca 08:48
You have to do it to, you know, 1,000 different sites is the problem, though. Because your municipality, at least, some municipalities, some counties provide that publicly, many states don’t, actually, it depends upon open data laws in those locations. But the point is, ultimately, if you want privacy, you may not have it even in what you consider to be your most private area, which is your home. And there are lots of downstream of implications of that. So for example, the complimentary example that I wanted to give in terms of assessor data is permit data. You can pull a permit to do something that has a high dollar amount, but it might be public. It’s to some degree, an advertisement for anyone who might want to acquire something of that nature from your house. So, you know, it’s the linkage of these things that becomes concerning to people, not in isolation, necessarily.

Jason Pufahl 09:45
Yeah, so let’s dive into that for a second, because I do feel like a lot of the conversations that we have, sort of go like this, you know, you’ve got this data, the data is sensitive, whatever we call that data, right, and you should protect it. And very often I’m met with that response of, well, why? Like, what’s the big deal? If somebody were to get floorplans for my house or you know, in the example that you used a second ago, if you file the permit for something that might have value, what’s the likelihood? And what’s the return? What’s the real risk? And do I really care? And we talk a little bit about salary talked a little bit about land records. We talked a little bit on tax records. In a way, I think it’s valuable to explore the, what makes that risky, and or why should somebody care? Because I think that’s important, because that’s where people go naturally.

Steven Maresca 10:36
I mean, to state the obvious here, if you’ve taken a permit, for, you know, something that may have fundamentally a high value, somebody could use that to infer, oh, this is a building where there might be a safe with firearms, oh, this is an electric vehicle I can steal, things of that nature. Theft is easy, right?

Matt Fusaro 10:57
Most of the time, it’s how people will pick you being a target or not. Your average criminal probably isn’t doing something like that they’re more opportunistic. But if you’ve got things that are of special interest, like you said, especially with permits, you can find those pretty easily.

Steven Maresca 11:16
We talk a lot about how the cost benefit goes into targeting for attackers. It just makes it more efficient, in my opinion. Now, this is a common pattern of our discussions, right? Here’s the data, here’s what it implies, here’s the impact. I don’t want to be all doom and gloom, surely, I’m going to pivot a little bit here,

Jason Pufahl 11:37
I don’t think this is a doom and gloom discussion, really.

Steven Maresca 11:37
No, but it could be perceived that way. Open data of this sort is wildly useful, for even those that might be of the belief that their information is being exposed. So it’s about nuance and balance to some degree. I found my house by using public property card data, GIS data in a county databases. It’s a wonder to have it available. You can look up abutters. Oh, I wanted, you know, public land. Yep. It’s right there, cross reference with MLS listings and out pops, you know, five properties of the 5,000 that meet your qualities. There are huge outcomes that candidly, few services are actually making use of today that are probably on the horizon. But you know, it’s like anything else, data can be used to support a perspective, whether nefarious or positive.

Jason Pufahl 11:50
So, how concerned do we think though, we need to be with some of this data?Certainly there’s people who really feel sensitive to voting records being that maybe not voting records, right, but voting history, being public. What advice do we have for people? Like we certainly we don’t have GDPR, we can’t go asking private sites typically to take down things too easily.

Steven Maresca 13:02
Well, yes, and no. And I do want to be clear, we’re not talking about voting records. That’s private, right, just to reassure everyone.

Jason Pufahl 13:11
The voting, the actual votes.

Steven Maresca 13:13
Yes, the actual actions to take and the outcomes really have to do with where the data might reside, you have two choices, the source, municipal state, some public agency, some third party that publishes their data, something like that, or the aggregators. Honestly, I would prefer to go after the aggregators in order to have my information removed, because they’re the path of least resistance for anybody who might make use of it, right. Those are data brokers, generally speaking, and that’s a hugely deep rabbit hole that might lead us towards marketing and things that we’re not really talking about right now. But we may not have GDPR. But we have a lot of companies that anticipate the arrival of such things and generally allow opt-outs, all of the data brokers of note in the United States, while they make it difficult, generally allow you to opt out of your records being published. You have to do them one by one. Just like so many of our conversations about credit bureau monitoring and freezes, you have to go to every single one, there’s no one place to do it, with the exception of like the Do Not Call Registry and things of that sort where there are laws to alter behavior nationwide. Yeah, it does require effort. But if you hit the top of the market, you can generally do what you need to do. Other examples, you know, search engines. If you find that you’ve Googled yourself and you have a record you dislike, there are vehicles to actually remove that officially with the search engine. It’s just a matter of actually going through the effort, perhaps with a lawyer, to file a complaint and have the data removed.

Jason Pufahl 15:04
Right, so taking the time, maybe incurring the cost, but there’s legitimate ways to do it if you want to make the effort to expunge your data, right?

Matt Fusaro 15:13
Yeah and I think it’s gonna be different for everyone, right, you know, some people, they might not care if they have a lot of this stuff is out there. And some people, it might affect them quite a bit, especially stuff like, your affiliation with a certain party.

Jason Pufahl 15:24
Right, your criminal records maybe, or past criminal records.

Matt Fusaro 15:28
Yeah, especially things like that, you know, for employment and whatnot, you know, employers have a right to go through a lot of that stuff. But there are certain things like, you know, let’s say you have problems with your taxes or something like that, you don’t necessarily want your employer knowing, but sometimes they make decisions on hiring on that stuff. It’s good to know that that stuff is there, and that they’re not, they’re not doing something illegal going and finding it out, it’s in the public.

Jason Pufahl 15:51
Right. And, of course, the changing legal landscape, right. I mean, with certainly things being decriminalized that may have landed you jail time in the past, you don’t necessarily want all of that visible.

Steven Maresca 16:03
You make an interesting point, or aside to changing legal landscape, though, there are state laws that are beginning to actually hit the books, in many states that are actually quite aggressive about GDPR like capabilities, you know, right to be forgotten type issues. They’re very specific to location. So you know, we can’t give too many specific examples at the moment, but look upward, look up the laws of where you live, you may have some vehicles to actually achieve some degree of privacy, which might be otherwise unattainable. You know, I think start with the source when possible. That’s the key, really good examples of that, we have a lot of educational customers, students are protected under FERPA. It’s a federal law, it covers a great deal of individual information about students, grades, date of birth, you know, course information, Title Nine protections have to do with gender preferred name, legal name, that sort of thing. If you’re a student, and you’re in a college right now, or recently affiliated, you have the ability to request of that institution that your records not be disclosed without your permission. There are many other examples like that. HIPAA is a good one. Generally speaking, HIPAA is not about protecting you individually, it’s about ensuring that there was a documentation trail for release of records between healthcare institutions, but the point is, ultimately that you as an individual are in control of some of that data.

Matt Fusaro 17:43
Yeah, HIPAA’s a good example of where you don’t have a ton of control sometimes, I mean, if you ever try to apply for things like life insurance, or there’s the MIB group, or whatever it is, it’s an organization that can legally under HIPAA receive all of your medical records to make a coverage decision. So even if you’re protected under something like that, like Steve said, there are ways for them to obtain things that you would consider private records. But as part of that protection, they’re allowed to do that, as long as there’s a trail.

Jason Pufahl 18:15
Right. As long as there’s a trail.

Steven Maresca 18:18
You know, I think it is worth mentioning in passing stuff that we have deliberately tried to stay away from here. But part of this is being cognizant of what information you’re voluntarily giving up when you sign up for services. You might all click through an agreement when signing up for something that’s a free service. Maybe think twice about it sometimes, or look at privacy settings regularly, make sure that they look like what they what you want them to be. There’s lots of information that we give up without thinking merely by browsing websites, or making shopping decisions. And it’s not part of public data really, per se, but it’s in the same sphere of things that can be used by aggregators, to reason about you in ways that might be surprising. And I want to mention a very specific example from 10 years ago, there was a slightly apocryphal, but still relatively validated story about a shopper of Target, who was sent flyers with pregnancy and maternity information in it. The parent of a younger person came in rather upset only to discover after the fact that no, the marketing flyer was distributed correctly, and that the shopping habits of that individual did indeed indicate upcoming pregnancy. So, it’s just a example of how data, when aggregated, can be used in ways that are perhaps surprising.

Jason Pufahl 19:49
Sure, but I think you’ve moved ahead a tiny bit right. So you’ve started talking now about social media and marketing and purchasing habits and things like that. In a lot of ways, those are things that you can control. You don’t have to use Facebook, right? You don’t have to buy via Amazon, you can be a little bit more anonymous. If you vote, right, that record is just a record. And so your only alternative there is really not to vote. So I think it is an interesting distinction or delineation there for your data that you generate simply as a matter of being part of society versus opting into, say social media, which is where I think you’re headed a little bit.

Steven Maresca 20:36
Yeah, I think I’ll even cut your sentence shorter, data you generate as an act of being, right. Yeah, ultimately, I think the message is there’s more out there than people think. It’s easy to find. Look for yourself. I mean, look, in all of the venues we’ve talked about as examples, your tax assessor to see what your house looks like, I think you might be surprised.

Jason Pufahl 21:02
Yeah, I think we’ll wrap with what I think is probably a theme that we hit pretty regularly, which is simply, be informed, understand. In this case, right, we’re not taught about private sites, right, understand how data in the public domain is created and stored and what you might be, what might be out there about you, and, and kind of you’ll find your comfort level, because I think you do have some recourse if you do want to get, if you do want to ask for some of it to be expanded, perhaps, but determine what your comfort level is and make some decisions around the data that you have out there.

Steven Maresca 21:36
And find comfort with the data that does exist, because some of its very useful.

Jason Pufahl 21:40
No doubt, right, as we’ve seen, through the way you’ve acted in the past with some of the data or interacted, perhaps, so alright, well, as always, yeah, you know, we hope that everybody got some value out of this, if nothing else, gets you thinking a little bit about, you know, what might be out there, and sort of what is your comfort level? So give that some thought. And if you feel like discussing the topic a little bit more, you know, we intentionally avoided specific locations of data, you know, in some cases, some specific. But if anybody’s curious about what are some of these registries or where data might be stored, sort of hit us up on on Twitter or LinkedIn and we’re happy to keep the conversation going. Matt, Steve, thanks.

22:28
Stay vigilant, stay resilient. This has been CyberSound.

Episode Details

Hosts
Categories