Skip to main content

Privacy-Preserving Techniques and Clean Room Results

LiveRamp Clean Room implements security and privacy by design and leverages an array of techniques for privacy preservation, empowering clean room data owners and their partners to build a secure, privacy-compliant solution for data collaboration.

It is critical to protect consumer privacy and the rights of data owners at every stage of data collaboration. This topic addresses privacy-preserving techniques applied to clean room output results. Results data consist of the approved, privacy-safe output of collaboration that conforms to the mutual requirements of data owners.

LiveRamp Clean Room uses privacy-preserving techniques that are configurable depending on the type of analysis and sensitivity of the dataset. These approaches include k-min anonymity enforcement and noise injection.

K-min Anonymity Enforcement

LiveRamp Clean Room applies a customer-specified k-min threshold (referred to as a "Crowd Size" data control parameter) to every user metric that is reported in all queries executed within clean rooms.

mceclip0.png

For example, consider the following query. If the crowd size (COUNT) were 100, then the last row in the result set will not be displayed to the user because it does not meet the minimum threshold.

SELECT usm.audience_segment, COUNT(*) AS impressions
FROM adlog_impressions ali, user_segment_map usm
WHERE ali.user_id = usm.user_id
GROUP BY 1
ORDER BY 2 DESC;

AUDIENCE_SEGMENT

IMPRESSIONS

High Household Income

184795

Owns RV

99264

...

...

Owns Stocks & Bonds

21166

Casino Gambler

20980

Children in Household

88 => NOT RETURNED/DISPLAYED

Noise Injection Using Laplacian Noise

In addition to the k-min anonymity enforcement, noise injection (referred to as a "Data Decibel" data control parameter) lets you apply a randomized metric data perturbation technique by adding Laplacian noise to alter the metrics shown to the user. This prevents leakage of sensitive information if a user tries to run queries that would return similar results with a known difference that could be exploited. You can specify the amount of noise using the Data Decibel parameter in the Create New Clean Room wizard.

mceclip1.png

The noise injection is based on random noise drawn from a Laplace distribution. The following perturbed function satisfies Epsilon (ἐ) differential privacy:

ML(x,f(.),ἐ) = f(x) + Lap(0,Δf/ἐ)

Where Δf/ἐ is the scale, 0 is the center of the distribution, and the value of Δf (also known as sensitivity) is 1 in the case of count metrics. Thus, the scaling factor for noise level is the Epsilon value within the formula.

This methodology is implemented as a Python user-defined function (UDF) appended to clean room queries. Question templates allow clean room owners to specify which metrics should be treated with random noise.

The Epsilon value is adjusted based on where the Data Decibel slider is set. A higher data decibel yields an Epsilon value closer to 0 (more noise) and a lower decibel yields an Epsilon value closer to 1 (less noise).

The following example scenario shows a base value before noise injection of 14,237. Its decibel values change depending on the amount of noise applied using the slider: Low (1), Medium (50), and High (99). Low and medium noise do not result in much variation, whereas high noise significantly changes the values.

Metric

Low Noise

Medium Noise

High Noise

count

1,000,000.00

1,000,000.00

1,000,000.00

mean

14,237.01

14,236.99

14,237.59

std

14.15

141.36

1.415.43

min

14,103.00

12758.00

1,648.00

25%

14,230.00

14,168.00

13,544.00

50%

14,237.00

14,237.00

14,237.00

75%

14,244.00

14,306.00

14,931.00

max

14,366.00

15,523.00

29,134.00

image idm660

Considerations

When applying k-min anonymity enforcement and noise injection, consider the following:

  • You should carefully consider the balance between privacy protection and query utility so that results are not compromised by stricter privacy treatments.

  • Rerunning a query for the same date range will produce different results. Therefore, you must account for the expected volume of runs.

  • You cannot configure noise injection for metrics that require a sensitivity value (Δf) higher than 1.