Minimizing The Blast Radius in The Uncharted World of AI Data Security

Chris Hines

Jul 24, 2024

Minimizing The Blast Radius in The Uncharted World of AI Data Security

You’ve heard it before. There is no part of your business that doesn’t involve data in some way. Whether you’re the CISO of a large city in the Northeast responsible for providing critical infrastructure for your citizens, the largest retailer in the world attempting to succeed in a crowded e-commerce market, or a Healthcare company looking to solve Alzheimer's. Data is the lifeblood of everything.

I was reminded of this when speaking with the CISO of a large entertainment company that develops streaming content for its customers - and competes with companies like Netflix. He said that if he could determine the percentage of his company that runs off of data, the amount of value would be unreal.

What he really meant was, tell me the data I should care most about, and I’ll focus my security efforts there, vs. trying to overprotect everything and risk frustrating my employees. After all, data is where the most damage is done. Yet, despite being the company’s greatest risk most security teams are spending a disproportionate amount of time focusing on the ancillary security tools. Endpoint security, in-line access controls like Security Service Edge - versus knowing and protecting the data. In fact - over 60% of organizations feel they lack adequate visibility into the data within their environment.

This would be like focusing all your security efforts on the Brinks truck, the bank's ATM locations (the endpoints), and the route those trucks take - but not protecting the actual bank vault itself. Yes, there will be some cash on the trucks as they travel between various branches or retail stores, yes there will be a few bucks in the bank ATM, and yes it’s always good to make sure the route is fast and secure - but the vault is where the real Crown Jewels lie.

As AI becomes more prevalent within the workplace - it creates an uncharted challenge for data protectors. AI, what might be the secret to the company’s greatest era of productivity, and what is at the tip of every board member's tongue - also threatens the company given the lack of guardrails in place to govern it.

Solving this challenge requires a different mindset. Many security leaders I speak with tend to drive the AI conversation around ensuring that sensitive data doesn’t leak out - and revert to asking questions about technologies like DLP (we all know how people feel about DLP). Yes, protecting stolen data is key, but DLP for AI is still just a natural extension of the way things have always been done, and that’s not good enough in the age of AI.

An additional, potentially larger, challenge is actually protecting that custom LLM your team is creating from maliciously, or accidentally, being fed bad data. Imagine a healthcare company whose LLM designed for the newest Alzheimer's drug starts inputting incorrect data into the LLM. Or accidentally uses PII or customer data from production environments - testing the limits of AI compliance. This too is part of the new AI security frontier, and why traditional approaches miss the mark.

So where do you start? The main problem I see is that most security leaders simply don’t know their data. Over time, they’ve adopted siloed data discovery solutions that vary in capabilities across structured, unstructured, and semi-structured data. These solutions were designed for on-premises environments, and often have little SaaS, public cloud, or PaaS capabilities. This toxic combo has made it difficult to determine what crown jewels actually exist within their vault.

Since they don’t know their data, they don’t know who has access to it or how that data is being used. It then becomes difficult to draw the connection between users (individuals, groups of users, or non-humans) and which of those have access to AI CoPilots or which of that data is being fed to the AI LLM. The AI blast radius struggle is very, very real.

But it’s not all doom and gloom. What organizations should do is take a quick beat, understand their data, and then roll out their Copilot solution. This will make life easier in the long run. Why do I say this? Most security teams don’t realize how Copilots work regarding default access rules. Tools like Microsoft CoPilot, Google Gemini, Amazon Sagemaker, Salesforce Einstein - these are just the first of the awesome examples of what will be a copilot from every app. But, like many of the world’s great innovations - they were not designed with security in mind.

This is how they all work:

Employees with access to the Copilot prompt the LLM.
That LLM has access to all the same data that the employees have access to. These tools are designed to be open - and are not designed for zero trust. You have to disable access.
If the proper access controls are not already in place, you just experienced your first AI breach.

It’s as fast and as simple as that.

Minimizing the blast radius - a five-step guide

Step 1

To protect against this you must have a solution that can help you discover and classify your data. These are capabilities that are found within data security posture management (DSPM) servies. If the vendor has a slick way to use AI to classify your data - even better - since you won’t have to rely solely on RegEx-based classification - and can accurately classify data even at the file/object level. Once you classify, you can determine the sensitivity of the data within your digital data “vault.” Remember not everything in your vault is of equal value (think back to my customer example above).

Step 2

You then need to combine the discovery insights, with identity insights. I often refer to this concept as zero trust data access (ZTDA). Maybe I spent too long in the security space, making up my own acronyms now! However, the idea is to determine who has access to your sensitive data, and of those, who has access to your AI Copilot - and determine if this makes sense based on the sensitivity of the data itself. This then informs the next step.

Step 3

Adjust access rights within your AI tools - this will put you on much more solid AI security ground!

Pro-Tip: Think about it. Your ability to improve the visibility into what data exists, and it’s sensitivity, allowed you to focus and prioritize - and minimize the potential blast radius from an AI breach

Step 4

Prune your data. Improving your data hygiene - identifying and removing unnecessary data helps further reduce your blast radius. That database was left abandoned in an on-premises datastore after a cloud migration initiative - delete it. Multiple backups of the same data - delete some of them. Less data equals less attack surface. It also means fewer costs (your company’s infrastructure leader, and CFO will love you)

Step 5

Perform monitoring, detection, and response of your data environment for ongoing vigilance over your data

At the end of the day, security leaders should look at AI positively. After all, every CISO has an opportunity to enable their businesses to adopt AI, but in a way that also allows them to protect the data sitting in their multi-cloud vault.

Just like with any job, having the right tools, and a plan in place, will make your life easier. It’ll work for that home improvement project you’ve been pushing off, and it will help you minimize your AI blast radius too.

It all starts by first discovering what data exists within your vaults. If your business is calling for an AI Copilot, you should be calling a data security vendor. We over here at Cyera would be happy to help.

‍