Set Dataset Analysis Rules

Abstract

Guidelines and best practices for analyzing datasets and ensuring data security and compliance

Dataset analysis rules are data governance controls that allow a dataset owner to optionally dictate how datasets can be queried and other permissions for configured datasets within a clean room.

The goal of setting analysis rules is to perform validations prior to dataset assignment to check whether questions are likely within a certain confidence interval to meet requirements for how a given dataset can be analyzed.

Note

LiveRamp is committed to maintaining flexibility and usability while enforcing privacy policies. As a result, we allow for arbitrary, complex SQL. We can typically catch most violations of analysis rules. However, we cannot validate with 100% certainty whether a question passes or violates analysis rules. We recommend performing further checks of questions at your discretion.

Dataset analysis rules are supported across all clean room types with some minor caveats for specific types. For more information, see the "Additional Guidelines for Specific Clean Room Types" section below.

LiveRamp clean rooms support analysis rules for datasets using both user list and analytical questions, as well as default rules.

Default Rules

LiveRamp clean rooms enforce a default rule for all datasets associated with your account to prevent the projection of fields labeled as PII in the data connection configuration when executing analytics questions. If a question attempts to include a PII-labeled field as an output in report results, the question run will fail.

Note

To request that this default rule be removed for a specific question, contact your LiveRamp representative.

Definition of Analytical Rules

Analytical rules specify the rules an analytical question must meet in order to run successfully. This rule type supports use cases such as segment analysis, measurement, and attribution. For more information on the options available in analytical rules, see the table below.

Rule	Definition	Options
Join Required	If yes is selected, this dataset must be joined to another dataset in analytical questions. You might want to use this on HEM \| CID or Ramp ID \| hashed CID mappings.	No/Yes
Aggregation Threshold	Specifies the minimum number of unique records that must be included in the aggregation calculation for identifier fields. For example, if your organization's policy requires that analyses using customer IDs must include at least 100 individuals, you should set the aggregation threshold for the customer ID field to "100".	Any integer (whole number) value
Allow Join	Indicates whether the column can be used in JOIN clauses to join the dataset to other tables.	Typically used for identifier columns Allow Join options are all selected by default, so you don't need to adjust them unless there are sensitive fields where you don't want to allow joins.
Allow Projection	Indicates whether the column (and its values) can be included in a question run's output. If this option is selected for identifier fields, it means that you are allowing the value to be included in the output. In most cases, you will not want to allow projections for identifier fields or any other fields that you consider to be sensitive.	Typically used for values used for segmentation (group by) analysis (such as Gender, State, Product Category, etc.) Allow Projection options are all selected by default, so you don't need to adjust them unless there are sensitive fields where you don't want to allow projection.
Analytical Functions	At least one of the selected functions must be run on the relevant field for the field to be projected in the question run's output (all other functions are allowed if one of these functions has been run). If all functions are allowed, at least one function must be run on the field.	ALL, COUNT, COUNT DISTINCT, SUM, SUM DISTINCT, AVG, STDDEV

Definition of List Rules

List rules pertain to queries that output row-level lists which may contain identifiers. This rule type supports use cases such as enrichment and segment building.

Rule	Definition	Options
Allow Join	Indicates whether the column can be used in JOIN clauses to join the dataset to other tables.	Typically used for identifier columns Allow Join options are all selected by default, so you don't need to adjust them unless there are sensitive fields where you don't want to allow joins.
Allow Projection	Indicates whether the column (and its values) can be included in a question run's output.	Typically used for values used for segmentation (group by) analysis (such as Gender, State, Product Category, etc.) Allow Projection options are all selected by default, so you don't need to adjust them unless there are sensitive fields where you don't want to allow projection.

Rule

Definition

Options

Allow Join

Indicates whether the column can be used in JOIN clauses to join the dataset to other tables.

Typically used for identifier columns

Allow Join options are all selected by default, so you don't need to adjust them unless there are sensitive fields where you don't want to allow joins.

Allow Projection

Indicates whether the column (and its values) can be included in a question run's output.

Typically used for values used for segmentation (group by) analysis (such as Gender, State, Product Category, etc.)

Allow Projection options are all selected by default, so you don't need to adjust them unless there are sensitive fields where you don't want to allow projection.

How Dataset Analysis Rules Work

When you apply analysis rules to a dataset, clean rooms will attempt to validate whether actions taken on the dataset in a question align with its assigned analysis rules. These validations occur in two places:

Dataset Assignment: When you assign a dataset to a question, LiveRamp will check whether the dataset has analysis rules that the question likely violates or not. LiveRamp will flag whether a question is likely to pass or fail the corresponding analysis rules, and you may take this information into account when assigning the dataset.
Note
Dataset owners with permission to manage datasets also have the right to skip the use of analysis rules if they wish to proceed with assigning the dataset after a question has failed analysis rules.
Question Run: When a user runs a question, LiveRamp will check whether the question violates analysis rules pertaining to the included datasets. If a likely violation is found, the question will not run unless analysis rules have been skipped in the dataset assignment step.

Configure Dataset Analysis Rules

Procedure. To configure analysis rules on a dataset in a clean room:

Enter the appropriate clean room.
If you haven't already done so, provision the dataset for use in the clean room. This allows you to set analysis rules on the configured fields.
From the Clean Room menu, select Datasets.
Click the gear icon next to the Analysis Rules setting on the dataset.
From the dialog that appears, select the type of rule you'd like to add from the tabs on the left:
- For analytical questions, select Analytical Rule and then click + Analytical Rule
- For list questions, select List Rule and then click + List Rule
Note
- To see an example of the type of rule, click the Example dropdown.
- If needed, you can create the other rule type after configuring the first rule.
Enter a rule name.
For analytical rules, perform the following steps and then click Save Rule:
Note
AWS clean rooms allow for a different combination of rules than other clean room types. Depending on the rule you've selected, this may require additional configurations or restrictions for functions allowed in the question.
1. Join Required: Select Yes to require that this dataset be joined to another dataset to be queried or select No to allow this dataset to be queried without joining to another dataset.
  Note
  Most often, you'll be selecting No.
2. Aggregation Threshold: For any fields that contain identifiers (such as a unique customer ID, household ID, or a hashed email), enter any desired aggregation threshold (the minimum number of unique records that must be included in the calculation of an aggregate clause).
3. Allow Join: For each field that you want to be able to use in a join clause within a question, select the check box in its Allow Join column.
  Note
  This is typically used for identifier columns.
4. Allow Projection: For each field that you want to allow in a question run's output, select the check box in its Allow Projection column.
  Note
  Fields containing identifiers might not be able to be projected due to your organization's settings.
5. Analytical Functions: For each included field, select the desired analytical functions. At least one of the selected functions must be run on the relevant field for the field to be projected in the question run's output (all other functions are allowed if one of these functions has been run). If all functions are allowed, at least one function must be run on the field.
  Note
  LiveRamp allows all common aggregation functions for every field by default. All functions are automatically allowed for fields where projection is allowed.
For list rules, perform the following steps and then click Save Rule:
1. Allow Join: For each field that you want to be able to use in a join operation within a question, select the check box in the Allow Join column.
  Note
  This is typically used for identifier columns.
2. Allow Projection: For each field that you want to allow in a question run's output, make sure that the check box in the Allow Projection column is selected.
  Note
  Fields containing identifiers might not be able to be projected due to your organization's settings.
If needed, repeat the steps above to create an additional rule for the other type of question.

Note

To edit a previously created rule, click on the gear icon next to the Analysis Rules setting on the dataset and then click Edit.
To delete a previously created rule, click on the gear icon next to the Analysis Rules setting on the dataset and then click the trash can icon.

Additional Guidelines for Specific Clean Room Types

See the additional guidelines below for certain clean room types:

AWS requires at least one analysis rule per question type per dataset for AWS clean rooms.
Column index referencing in Snowflake questions will lead to errors. Determine whether column index referencing is required if you would also like to enforce analysis rules.

Dataset Analysis Rule FAQs

Which LiveRamp clean room patterns support dataset analysis rules?

All clean room types support dataset analysis rules.

How are dataset analysis rules enforced?

For AWS Clean Rooms, LiveRamp leverages AWS' native analysis rules validator to enforce analysis rules. For other clean room patterns, LiveRamp supports its own validator using synthetic data to test whether a question will pass a given dataset's analysis rules before executing on the real data.

Can I make an exception to a dataset analysis rule for a given question and dataset pair?

Yes, you may skip analysis rules when assigning a dataset to a question at your own discretion.

In this section:

Set Dataset Analysis Rules

Note

Default Rules

Note

Definition of Analytical Rules

Definition of List Rules

How Dataset Analysis Rules Work

Note

Configure Dataset Analysis Rules

Note

Note

Note

Note

Note

Note

Note

Note

Note

Additional Guidelines for Specific Clean Room Types

Dataset Analysis Rule FAQs

Search results