Skip to main content

Question Builder Best Practices

LiveRamp Clean Room's Question Builder provides a UI for building questions using SQL queries, run-time parameters, and report configuration.

To ensure that your query is valid and properly configured, consider the following SQL and report configuration guidelines.

Table Names Are Case-Sensitive

For example, @AdLogs is correct and @adlogs will return a run-time error. It's suggested that the table names be copied and pasted with @ aliases from the left-side pane where the dataset types are displayed.

Correct Version:

image idm2688

Incorrect Version:

image idm2693

When the dataset type is not written in a case-sensitive manner, the error message at run-time will show as Failed.

image idm2698

Question Name and Category Are Mandatory Fields

Question Name and Category are mandatory fields for saving the query and moving to the next screen.

image idm2708

Unique Question Names

It is possible to have duplicate question names in your list of clean room questions. To avoid duplication and confusion, create unique question names.

image idm2743

Identically Named Dataset Types

When two dataset types (tables) are named identically, as shown in the following screenshot, LiveRamp Clean Room maps the table referenced that is also referenced in the SQL first. The two or more originally identically named tables can be referenced in the SQL as UserData1, UserData2, and so on.

image idm2758

Using Run-Time Parameters

You can add runtime parameters that act as dynamic variables within your code.

Click + in the Parameters section to add a parameter.

question-parameters.png

For example, to get results for different cuts of data, your partner can specify a run-time parameter value to use for a specific question run. To enable your partner to run the same query on multiple campaigns, you could provide a @campaign_id runtime parameter. Your partner would be asked to specify the ID or set of IDs they want the question to use to filter the data.

Validating a Query

When you click Validate Code in Query Builder, a range of important things are checked, including:

  • Macro formatting: The dataset macros must be properly populated in your SQL query.

  • SQL dialect: Your query must be compatible with the SQL dialect used by the type of clean room. For information, see "SQL Guidelines" below.

  • SQL errors: Indicates an issue with the SQL that requires review and repair of the code. Issues may relate to missing SELECT statements, commas, unknown commands, and so on.

  • Data type: The data types of your inputs and outputs must be aligned with the data types used in your query.

    Adding an incorrect or unexpected data type for a dimension or a measure in the right-side pane in Question Builder will result in a run-time error:

    image idm2753

    You can confirm the data types of various dimensions and measures for the valid datasets (or tables) that are available in the left-side pane under Dataset.

    image idm2703
  • Misaligned dimensions and measures: Sometimes, validating a query will adjust the dimensions and measures, so it is important to confirm the dimensions and measures by removing and re-adding them in the right-side pane.

    In the following example, there are two dimensions and two measures. When clicking the Validate button below the SQL box, the engine running in the background automatically aligns all the fields respectively. If the SQL validator doesn't align the metrics, manually remove the measures from the dimensions list and add them to the measures list.

    image idm2723
  • Permission issues: If your SQL and Question Builder settings are correct and you continue to receive static or run-time errors when you click Validate, contact your LiveRamp representative.

Availability of Dimensions and Measures in the Right-Hand Pane

The Dimensions and Measures associated with the main Select statement must be present in the Question Builder screen's right pane.

A dimension is a field in your data that can be used to aggregate the data, such as campaign name or device type. A measure is a fact that represents the count or sum of a field, such as the count of impressions or the sum of sales.

If you don't specify dimensions and measures, the Report Configuration step will not make those fields available in charts and tables, and the Question Configuration screen will not indicate that you are missing dimensions or measures.

The Validate button and the right-hand pane won't stop you from continuing unless the right pane is empty. For example, consider a main Select statement where there are two dimensions (audience_segment_id and AUDIENCE_SEGMENT_NAME) and two measures (OVERLAP and TOTAL). In the right-side pane, only one dimension and two measures will be displayed. This will not be highlighted as an error on the Question Configuration screen and the missing field (in this case, the audience_segment_id) won't appear on the Report Configuration screen.

To automatically populate the dimensions and measures in the right-hand pane, click Extract Output Fields. For information, see "Validate Dimensions and Measures".

image idm2728

At least one measure and one dimension are mandatory for the query to be valid. It is important to keep both dimensions and measures in the main Select statement so they are available for Report Configuration.

The following screenshot shows the error message you will receive if either the dimensions or measures are empty.

image idm2733

When using field aliases in the main Select statement, all the measures must have an alias so that they get picked up by the Report Configuration layer. The following screenshot shows how measures are aliased in the main Select statement.

image idm2738

Adding Dimensions and Measures in the Right-Hand Pane

If you want a field to represent a dimension instead of a measure, add it to the list of dimensions. You can also move dimensions to the measures list.

If the dimensions and measures are not placed correctly in the respective groups, query results will impact the chart or the format of the fields. For example, if a campaign ID is saved as an integer, then campaign ID 1234 will appear as 1.23k.

Adding incorrect values can also create conflicts if dataset analysis rules are configured for the assigned datasets. For information, see "Set Dataset Analysis Rules".

Space in a Parameter Name

Question Builder does not accept a space in parameter names. Errors will be displayed if spaces are included.

image idm2763

Outputting Multiple Fields in the User List Query Type

A user list query can be outputted with more than one field from the available tables. You can also output a mix of dimensions and measures in a user list query.

SQL Guidelines

Note

LiveRamp is committed to maintaining the integrity of its clean room architecture, so certain SQL statements and Python packages are not yet supported. To request a security evaluation for specific clauses or packages, contact your LiveRamp representative.

Depending on the types of clean room you are using, Question Builder supports certain SQL dialects:

Using a UNION (or UNION ALL) in the Main Select Statement

The main SELECT statement cannot have a UNION set operator. Remove UNION queries that are not within a subquery.

As an alternative, create a common table expression (CTE) where the UNION is followed by using the CTE in the main SELECT statement.

Filtering Joins to Determine CID-RampID Mapping

You can use a join operation to combine rows based on a related column from two or more tables to create a combined row that is used in the query. This can help determine CID-to-RampID recognition rates if you filter out rows with inapplicable values, such as the following:

  • Deleted: Deleted records typically appear as 0 or del% depending on when the deletion occurred.

  • No identifier: Records that lack an identifier typically appear as NULL or UNMATCHED depending on where the data originated.

Example of two tables joined on their CID columns:

COUNT (DISTINCT CID) as recognized_CIDs
FROM client_table t1
INNER JOIN mapping_table t2
    on t1.CID = t2.CID
    WHERE t2.rampid LIKE 'X%' 
      OR  t2.rampid LIKE 'h%'

Where:

  • client_table includes a CID column but lacks a RampID column.

  • mapping_table includes a CID column and a RampID column to establish the relationship between these identifiers on each row.

  • X% refers to various prefixes for RampID types that begin with X.

  • h% refers to the prefix for household RampIDs.

For information about RampIDs, see "RampID".

Caution

When performing RampID-to-RampID joins where transcoding is enabled, you should not include additional filtering. Transcoding includes data transformations that would render an incorrect result if filtering is applied.

Aggregation Thresholds

Dataset owners can create dataset analysis rules to specify a minimum aggregation threshold to preserve privacy based on their organization's risk tolerance. The data owner can set the aggregation threshold for any field considered to be a user identifier field, such as a customer ID, RampID, or loyalty ID.

When you create your query, you can specify a numeric value with an aggregation clause. It is typically recommended that you use the AGG_THRESHOLD macro, which serves as a placeholder for whatever aggregation threshold is specified for the dataset. Omitting an aggregation threshold risks failing validation.

For example, assume you want the SUM of a sales figure each day in transaction data from a partner, and your partner has enforced an aggregation threshold of 75 on their loyalty_id field. Without the aggregation threshold rule, you might attempt to run a query such as the following:

SELECT
@transaction_table.@transaction_date as txn_date,
SUM(@transaction_table.@sales_amount) as total_sales
FROM @transaction_table
WHERE txn_date BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY txn_date
;

This query will fail validation if an aggregation threshold analysis rule is in place. You could include an aggregation of at least 75 on loyalty_id as follows:

SELECT
@transaction_table.@transaction_date as txn_date,
SUM(@transaction_table.@sales_amount) as total_sales
FROM @transaction_table
WHERE txn_date BETWEEN '2025-01-01' AND '2025-01-31' AND
GROUP BY txn_date
HAVING COUNT(DISTINCT @transaction_table.@loyalty_id) >= 75
;

This approach will pass the analysis rules validation. However, you may not always know if your partners' datasets have aggregation threshold rules in place and what their threshold values are. In such cases, the best practice is to use our alternative option for dynamically allowing a query to enforce an aggregation threshold without needing knowledge of what the threshold value should be. This uses the appended _AGG_THRESHOLD macro in place of a hard-coded integer using the following syntax:

@table_name.@field_name_AGG_THRESHOLD

Where: You replace table_name with your dataset macro and field_name with your field macro and append _AGG_THRESHOLD.

The above query would be re-written to support this like so:

SELECT
@transaction_table.@transaction_date as txn_date,
SUM(@transaction_table.@sales_amount) as total_sales
FROM @transaction_table
WHERE txn_date BETWEEN '2025-01-01' AND '2025-01-31' AND
GROUP BY txn_date
HAVING COUNT(DISTINCT @transaction_table.@loyalty_id) >= @transaction_table.@loyalty_id_AGG_THRESHOLD
;

When the _AGG_THRESHOLD macro is used, LiveRamp will check whether an aggregation threshold rule is applied to the referenced dataset. If not, we will dynamically replace this value in your query with 0, and if so, we will replace it with the value for the specified aggregation threshold. It is best practice to use the _AGG_THRESHOLD macro in all cases, so you can reuse questions across collaborations and develop questions for multiple partners who may have varying aggregation thresholds depending on their organizations' privacy policies.

Note

The aggregation threshold that the data owner specifies in their dataset analysis rules is different than the Crowd Size option that can be set when editing a clean room. Crowd Size applies a k-min value selected at the clean room level by the clean room's owner to the output values of selected measures. For more information, see "K-min Anonymity Enforcement".

By comparison, the aggregation threshold set by the dataset owner requires validation that your query adequately enforces the threshold rule on input. If a dataset's analysis rules include an aggregation threshold, the query must consider this rule by including the appropriate HAVING or WHERE clause at each point of aggregation.

Analytical Functions

When creating an analytical rule for a dataset, data owners can specify the following aggregate SQL functions that can be run against the dataset for certain fields:

  • AVG: Calculates the average value of a numeric column

  • COUNT: Returns the total number of rows matching specified criteria

  • COUNT DISTINCT: Returns the number of unique values in a specified column

  • SUM: Calculates the total sum of values in a column

  • SUM DISTINCT: Calculates the sum of distinct values in a specified column

  • STDDEV: Returns the biased standard deviation (division by n) of a set of numbers

For example, a dataset owner could create a dataset analysis rule that only allows a query to run if Count Distinct is applied to the specified field. In that case, your query must use Count Distinct on the specified field at least once. Otherwise, the query cannot be run. This is an added privacy protection policy many data owners put in place to limit the risk of leaking privacy for individuals whose data is included in the dataset.

If the dataset owner limits the functions you can use, you will receive an error if these functions are not applied to the specified field when generating aggregates. The failure notice will indicate the needed query modifications. To avoid such errors, communicate with the dataset owner to know what is allowed.

For information about dataset analysis rules, see "Definition of Analytical Rules".

Report Configuration

Updating a Query Erases the Previous Report Runs

When you update and save a query, the previous runs will disappear. As a best practice, if you would like to keep earlier runs of the question, you can clone the query. If question versioning is enabled for your organization, prior runs are retained. For information, see "View Question History and Switch Versions".

Questions can be updated only when the status is "stage". Once published, questions cannot be edited. For every update, the stage versions will change from Stage v1 to Stage v2, Stage 3, and so on. Clone is the only available option once a question's status is "published".

Changing Components in Question Builder or Report Configuration

If you change the chart type or anything in the existing charts on the Report Configuration page, the dataset does not need to be reconnected and approved.

Similarly, if you change any component in the Question Configuration window other than the SQL itself or the Parameters, the datasets do not need to be reconfigured.

Report Filters

When you add a filter to a dimension on the Report Configuration page, after selecting either Single Select or Multi Select, you cannot unselect the option. Additionally, a Filter Type must be selected and cannot be left empty. To remove the filter, deselect the Filter toggle.

If you have a function such as distinct count or average and your filter is at a lower level of granularity than your measure, your calculations may not appear or work correctly.

image idm2768

Selecting the Y-Axis Position When Configuring Reports

Once the y-axis position is selected, you cannot unselect the value in that list. This will be updated in a future release so that the y-axis position can be unselected.