GuidesSandbox Security8 min read

What Is Data Masking in Salesforce?

Data masking replaces real personal data in Salesforce sandbox environments with realistic-looking substitute values. The substitutes are indistinguishable from real data in format and structure.but they contain no information about any real person. When developers, QA engineers, or contractors work in a masked sandbox, they see data that behaves like production without being production.

Why Salesforce sandboxes contain real data

Full-copy sandbox refreshes create a byte-for-byte copy of your production Salesforce org.including every contact record, every case, every activity, every custom object. If your production org contains 50 million records with real names, addresses, SSNs, and financial data, your full-copy sandbox contains the same 50 million records with the same real data.

This is by design.developers need realistic data to test against. But realistic doesn't have to mean real. Data masking gives you the best of both: data that behaves like production (same volume, same relationships, same field types) without exposing any real person's information.

What data masking does at the field level

Field-level masking replaces the value in a specific field with a realistic substitute according to a configured rule:

  • Names → replaced with valid-format names from a name library (not random characters.realistic names)
  • Email addresses → replaced with valid-format emails on a safe domain (user@masked.example.com)
  • Phone numbers → replaced with valid-format numbers that match the regional format
  • SSNs → replaced with valid-checksum SSNs that match the format but belong to no real person
  • Dates of birth → replaced with dates in a realistic range that preserves age distribution
  • Financial amounts → replaced with amounts in a realistic range that preserves income distribution

The key property of good masking is that it preserves data utility: testing workflows produce the same results on masked data as they would on real data, because the distributions and relationships are maintained.

Why Salesforce's native Data Mask isn't enough

Salesforce includes a native Data Mask feature at no additional cost. It addresses the basic use case.replacing some fields with random values. But enterprise Salesforce orgs quickly hit limitations:

  • Formula fields, picklists, and checkboxes are not maskable with native Data Mask
  • Custom object relationships are not maintained.masking one object can break related records
  • Performance at scale.native Data Mask is slow on orgs with tens of millions of records
  • No automation.native Data Mask requires a manual trigger; it doesn't run automatically on sandbox refresh
  • No DevOps integration.no API to trigger from Copado, Gearset, or GitLab
  • No post-refresh automation.email suppressions and callout blockers are not managed

For organizations with complex data models, large data volumes, or DevOps pipelines, native Data Mask requires significant workarounds or is simply not sufficient.

How data masking handles object relationships

Salesforce data models are not flat.a Contact has related Activities, Cases, Opportunities, and custom records. Masking a name in the Contact record doesn't mask that name in related records that store it directly.

Enterprise data masking handles this by masking across related objects as a consistent set: when a Contact's name is replaced, every related record that references that name is updated with the same substitute value. This preserves relational integrity.the masked database is consistent, and queries that join across objects return coherent results.

What happens to RTBF-deleted records in sandbox

When a contact is deleted from production after a right-to-erasure request, that record lives on in every sandbox copy until the next sandbox refresh. If your sandbox refresh cycle is every 6 months, you've had a technically non-compliant copy of GDPR-protected personal data in your sandbox for up to 6 months.

Data masking addresses this: when the sandbox refreshes, it reflects the current production state.records deleted in production are absent from the masked sandbox. This closes a compliance gap that most organizations don't realize exists.

Data masking vs. anonymization vs. pseudonymization

These terms are often used interchangeably but mean different things under GDPR:

  • Anonymization.data is irreversibly altered so that the person can never be re-identified. Anonymized data is outside GDPR scope. True anonymization in a relational database is difficult to achieve without destroying data utility.
  • Pseudonymization.data is replaced with a substitute (pseudonym) such that re-identification is possible only with additional information held separately. Pseudonymized data is still personal data under GDPR. Most data masking for sandbox environments is pseudonymization.
  • Masking.in practice, a broad term that includes both approaches. For sandbox security purposes, the goal is that a data breach of the sandbox environment exposes no real personal data.whether that's achieved through anonymization or pseudonymization is secondary.

Key Takeaways

Data masking replaces real PII with realistic substitutes that preserve data utility for testing.

Salesforce's native Data Mask does not handle formula fields, picklists, custom objects at scale, or DevOps automation.

Good masking preserves relational integrity.consistent values across related objects.

RTBF-deleted records in production must not persist in sandbox copies.this is a GDPR compliance requirement.

Masking for sandbox environments is distinct from production-side privacy controls like encryption or access controls.

Frequently Asked Questions

See how this works in your Salesforce org

30-minute demo tailored to your specific use case and data model.