Advanced Configuration: Dealing with large data set (LDV)


Got millions of records in your Salesforce Org? You’ll need to chunk millions of records into smaller data-set.

Example: There are 20 million contacts. It could take tens of hours to mask them, the masking operation can also encounter SOQL time-outs. The optimal approach is to slice them into five 4 million data sets. This way all of these five data sets will process in parallel. This also prevents any SOQL time-out exceptions.

To slice large data sets into smaller data sets, DataMasker ‘Filter Criteria’ can be used. Filter Criteria contains the where clause of a SOQL. The challenge with utilizing the ‘where’ clause in this situation, is that when you attempt to perform the SOQL operation on a chunk of records, it can time out. This makes it difficult to guess the right number of records to mask.

Method -1


This is where Postman comes in. Postman is an API platform for building and using APIs. It simplifies each step of the API lifecycle and streamlines collaboration, helping you create better APIs faster.

Below are the steps on how to utilize Postman:

Step 1: Refer article to download Postman and connect to your Salesforce Org. https://trailhead.salesforce.com/content/learn/projects/quick-start-connect-postman-to-salesforce/set-up-and-connect-postman

Step 2: Once the connection is made, you now have the ability to submit simple query batch jobs using Bulk API. This will make it easier for you to get the count as it would not time out. For this refer to the following article, linked here.

https://trailhead.salesforce.com/content/learn/projects/quick-start-connect-postman-to-salesforce/set-up-and-connect-postman

https://trailhead.salesforce.com/en/content/learn/modules/api_basics/api_basics_bulk

Step 3: Once a job is submitted, go to ‘Navigate’ -> ‘Salesforce Dashboard Setup’ -> ‘Bulk Data Load Jobs.’

Now, you can see the status of the job, and prevent your masking from being impacted by time-outs.

Method -2

This makes it difficult for you to guess the right number of records to mask. Here Workbench comes to the rescue. Workbench is an API platform for building and using APIs. It simplifies each step of the API lifecycle and streamlines collaboration, helping you create better APIs—faster.

Enable ability to execute Bulk API queries through an exposed DataMasker API

pcldm.DM_DataMaskingService.createBulkQueryJob(‘____’);

For eg.,

pcldm.DM_DataMaskingService.createBulkQueryJob(‘select id from Contact limit 3000000’);

Goto workbench, login with your salesforce org and click on the ‘Apex Execute’.

Click on ‘Execute’.

After Execute the query it will give the result

The User has to check in Bulk Data Loads Jobs.

Method -3

Enable the ability to execute Bulk API queries through an exposed DataMasker API via Developer Console

pcldm.DM_DataMaskingService.createBulkQueryJob(‘select id from employee__c limit 2000000’);

Click on ‘Execute’.

The User has to check in Bulk Data Loads Jobs.