AI is fundamentally driven by data. It is used to train and tune models, enable agents to plan and reason, and fuel interactions with end users. However, it can also create risks, such as sensitive data leaks, unwanted data collection, and data misuse.
In the AI era, organizations need more than security controls that rely on manual tagging and simple keyword matching. Effective data protection now depends on understanding context.
To help you meet this challenge, Google Cloud’s Sensitive Data Protection (SDP) now uses advanced AI technology to power a new set of context classifiers (including medical and finance) and image object detectors (such as faces and passports). By understanding the context of data — even within images and rich documents — our enhanced rules engine can identify and mask sensitive information more effectively, helping to ensure that your AI agents access only the data they need.
Now generally available, these new SDP capabilities allow you to safely unlock the value of your data at every stage of the AI journey, from initial training and fine-tuning to real-time agent responses. By helping to ensure that sensitive identifiers like personally identifiable information (PII) are selectively removed, you can feed your models high-quality data without the associated risks.
Here are a few ways you can integrate these new SDP capabilities into your AI strategy.
AI tuning and data sanitization in Vertex AI
When you tune a model like Gemini with your own business data, you can introduce new risks hidden in your data. On Vertex AI, Sensitive Data Protection can help mitigate these risks by enabling managed data discovery. It continuously scans your organization or selected projects for sensitive markers, including those within unstructured image data.
For example, SDP discovery can find credit card numbers, faces, and photo ID cards using advanced optical character recognition (OCR) and object detection. When sensitive data is discovered, rather than discarding it and reducing the value of your training datasets, you can use SDP to generate redacted versions.
Consider the image below showing a damaged package next to a person. The system allows you to keep the image for training purposes while selectively obscuring the face or the entire person to ensure privacy.







