Question Builder Best Practices
LiveRamp Clean Room's Question Builder provides a UI for building questions using SQL queries, run-time parameters, and report configuration.
To ensure that your query is valid and properly configured, consider the following SQL and report configuration guidelines.
Table Names Are Case-Sensitive
For example, @AdLogs is correct and @adlogs will return a run-time error. It's suggested that the table names be copied and pasted with @ aliases from the left-side pane where the dataset types are displayed.
Correct Version:
![]() |
Incorrect Version:
![]() |
When the dataset type is not written in a case-sensitive manner, the error message at run-time will show as Failed.
![]() |
Question Name and Category Are Mandatory Fields
Question Name and Category are mandatory fields for saving the query and moving to the next screen.
![]() |
Unique Question Names
It is possible to have duplicate question names in your list of clean room questions. To avoid duplication and confusion, create unique question names.
![]() |
Identically Named Dataset Types
When two dataset types (tables) are named identically, as shown in the following screenshot, LiveRamp Clean Room maps the table referenced that is also referenced in the SQL first. The two or more originally identically named tables can be referenced in the SQL as UserData1, UserData2, and so on.
![]() |
Using Run-Time Parameters
You can add runtime parameters that act as dynamic variables within your code.
Click
in the Parameters section to add a parameter.
For example, to get results for different cuts of data, your partner can specify a run-time parameter value to use for a specific question run. To enable your partner to run the same query on multiple campaigns, you could provide a @campaign_id runtime parameter. Your partner would be asked to specify the ID or set of IDs they want the question to use to filter the data.
Validating a Query
When you click
in Query Builder, a range of important things are checked, including:Macro formatting: The dataset macros must be properly populated in your SQL query.
SQL dialect: Your query must be compatible with the SQL dialect used by the type of clean room. For information, see "SQL Guidelines" below.
SQL errors: Indicates an issue with the SQL that requires review and repair of the code. Issues may relate to missing SELECT statements, commas, unknown commands, and so on.
Data type: The data types of your inputs and outputs must be aligned with the data types used in your query.
Adding an incorrect or unexpected data type for a dimension or a measure in the right-side pane in Question Builder will result in a run-time error:
You can confirm the data types of various dimensions and measures for the valid datasets (or tables) that are available in the left-side pane under Dataset.
Misaligned dimensions and measures: Sometimes, validating a query will adjust the dimensions and measures, so it is important to confirm the dimensions and measures by removing and re-adding them in the right-side pane.
In the following example, there are two dimensions and two measures. When clicking the Validate button below the SQL box, the engine running in the background automatically aligns all the fields respectively. If the SQL validator doesn't align the metrics, manually remove the measures from the dimensions list and add them to the measures list.
Permission issues: If your SQL and Question Builder settings are correct and you continue to receive static or run-time errors when you click Validate, contact your LiveRamp representative.
Availability of Dimensions and Measures in the Right-Hand Pane
The Dimensions and Measures associated with the main Select statement must be present in the Question Builder screen's right pane.
A dimension is a field in your data that can be used to aggregate the data, such as campaign name or device type. A measure is a fact that represents the count or sum of a field, such as the count of impressions or the sum of sales.
If you don't specify dimensions and measures, the Report Configuration step will not make those fields available in charts and tables, and the Question Configuration screen will not indicate that you are missing dimensions or measures.
The Select
statement where there are two dimensions (audience_segment_id and AUDIENCE_SEGMENT_NAME) and two measures (OVERLAP and TOTAL). In the right-side pane, only one dimension and two measures will be displayed. This will not be highlighted as an error on the Question Configuration screen and the missing field (in this case, the audience_segment_id) won't appear on the Report Configuration screen.
To automatically populate the dimensions and measures in the right-hand pane, click Validate Dimensions and Measures".
. For information, see "![]() |
At least one measure and one dimension are mandatory for the query to be valid. It is important to keep both dimensions and measures in the main Select
statement so they are available for Report Configuration.
The following screenshot shows the error message you will receive if either the dimensions or measures are empty.
![]() |
When using field aliases in the main Select
statement, all the measures must have an alias so that they get picked up by the Report Configuration layer. The following screenshot shows how measures are aliased in the main Select
statement.
![]() |
Adding Dimensions and Measures in the Right-Hand Pane
If you want a field to represent a dimension instead of a measure, add it to the list of dimensions. You can also move dimensions to the measures list.
If the dimensions and measures are not placed correctly in the respective groups, query results will impact the chart or the format of the fields. For example, if a campaign ID is saved as an integer, then campaign ID 1234 will appear as 1.23k.
Adding incorrect values can also create conflicts if dataset analysis rules are configured for the assigned datasets. For information, see "Set Dataset Analysis Rules".
Space in a Parameter Name
Question Builder does not accept a space in parameter names. Errors will be displayed if spaces are included.
![]() |
Outputting Multiple Fields in the User List Query Type
A user list query can be outputted with more than one field from the available tables. You can also output a mix of dimensions and measures in a user list query.
SQL Guidelines
Note
LiveRamp is committed to maintaining the integrity of its clean room architecture, so certain SQL statements and Python packages are not yet supported. To request a security evaluation for specific clauses or packages, contact your LiveRamp representative.
Depending on the types of clean room you are using, Question Builder supports certain SQL dialects:
AWS supports a limited set of SQL functions. For information, see "Overview of SQL in AWS Clean Rooms".
Azure confidential computing supports Apache Spark SQL. For information, see "Spark SQL".
BigQuery supports the GoogleSQL dialect. For information, see "Introduction to SQL in BigQuery".
Databricks supports ANSI-compliant SQL and Python. For information, see "ANSI compliance in Databricks Runtime" and "Databricks SQL Connector for Python".
Hybrid clean rooms support Spark SQL. Hybrid clean rooms with Clean Compute also support PySpark. For information, see "PySpark Overview".
Snowflake supports Snowflake SQL and Python via Snowpark. For information, see "Query Data in Snowflake" and "Snowpark Developer Guide for Python".
Using a UNION (or UNION ALL) in the Main Select Statement
The main SELECT
statement cannot have a UNION
set operator. Remove UNION
queries that are not within a subquery.
As an alternative, create a common table expression (CTE) where the UNION
is followed by using the CTE in the main SELECT
statement.
Filtering Joins to Determine CID-RampID Mapping
You can use a join operation to combine rows based on a related column from two or more tables to create a combined row that is used in the query. This can help determine CID-to-RampID recognition rates if you filter out rows with inapplicable values, such as the following:
Deleted: Deleted records typically appear as
0
ordel%
depending on when the deletion occurred.No identifier: Records that lack an identifier typically appear as
NULL
orUNMATCHED
depending on where the data originated.
Example of two tables joined on their CID columns:
COUNT (DISTINCT CID) as recognized_CIDs FROM client_table t1 INNER JOIN mapping_table t2 on t1.CID = t2.CID WHERE t2.rampid LIKE 'X%' OR t2.rampid LIKE 'h%'
Where:
client_table
includes a CID column but lacks a RampID column.mapping_table
includes a CID column and a RampID column to establish the relationship between these identifiers on each row.X%
refers to various prefixes for RampID types that begin with X.h%
refers to the prefix for household RampIDs.
For information about RampIDs, see "RampID".
Caution
When performing RampID-to-RampID joins where transcoding is enabled, you should not include additional filtering. Transcoding includes data transformations that would render an incorrect result if filtering is applied.
Aggregation Thresholds
Dataset owners can create dataset analysis rules to specify a minimum aggregation threshold to preserve privacy based on their organization's risk tolerance. The data owner can set the aggregation threshold for any field considered to be a user identifier field, such as a customer ID, RampID, or loyalty ID.
When you create your query, you can specify a numeric value with an aggregation clause. It is typically recommended that you use the AGG_THRESHOLD
macro, which serves as a placeholder for whatever aggregation threshold is specified for the dataset. Omitting an aggregation threshold risks failing validation.
For example, assume you want the SUM of a sales figure each day in transaction data from a partner, and your partner has enforced an aggregation threshold of 75 on their loyalty_id field. Without the aggregation threshold rule, you might attempt to run a query such as the following:
SELECT @transaction_table.@transaction_date as txn_date, SUM(@transaction_table.@sales_amount) as total_sales FROM @transaction_table WHERE txn_date BETWEEN '2025-01-01' AND '2025-01-31' GROUP BY txn_date ;
This query will fail validation if an aggregation threshold analysis rule is in place. You could include an aggregation of at least 75 on loyalty_id as follows:
SELECT @transaction_table.@transaction_date as txn_date, SUM(@transaction_table.@sales_amount) as total_sales FROM @transaction_table WHERE txn_date BETWEEN '2025-01-01' AND '2025-01-31' AND GROUP BY txn_date HAVING COUNT(DISTINCT @transaction_table.@loyalty_id) >= 75 ;
This approach will pass the analysis rules validation. However, you may not always know if your partners' datasets have aggregation threshold rules in place and what their threshold values are. In such cases, the best practice is to use our alternative option for dynamically allowing a query to enforce an aggregation threshold without needing knowledge of what the threshold value should be. This uses the appended _AGG_THRESHOLD
macro in place of a hard-coded integer using the following syntax:
@table_name
.@field_name
_AGG_THRESHOLD
Where: You replace table_name
with your dataset macro and field_name
with your field macro and append _AGG_THRESHOLD
.
The above query would be re-written to support this like so:
SELECT @transaction_table.@transaction_date as txn_date, SUM(@transaction_table.@sales_amount) as total_sales FROM @transaction_table WHERE txn_date BETWEEN '2025-01-01' AND '2025-01-31' AND GROUP BY txn_date HAVING COUNT(DISTINCT @transaction_table.@loyalty_id) >= @transaction_table.@loyalty_id_AGG_THRESHOLD ;
When the _AGG_THRESHOLD
macro is used, LiveRamp will check whether an aggregation threshold rule is applied to the referenced dataset. If not, we will dynamically replace this value in your query with 0, and if so, we will replace it with the value for the specified aggregation threshold. It is best practice to use the _AGG_THRESHOLD
macro in all cases, so you can reuse questions across collaborations and develop questions for multiple partners who may have varying aggregation thresholds depending on their organizations' privacy policies.
Note
The aggregation threshold that the data owner specifies in their dataset analysis rules is different than the Crowd Size option that can be set when editing a clean room. Crowd Size applies a k-min value selected at the clean room level by the clean room's owner to the output values of selected measures. For more information, see "K-min Anonymity Enforcement".
By comparison, the aggregation threshold set by the dataset owner requires validation that your query adequately enforces the threshold rule on input. If a dataset's analysis rules include an aggregation threshold, the query must consider this rule by including the appropriate HAVING or WHERE clause at each point of aggregation.
Analytical Functions
When creating an analytical rule for a dataset, data owners can specify the following aggregate SQL functions that can be run against the dataset for certain fields:
AVG: Calculates the average value of a numeric column
COUNT: Returns the total number of rows matching specified criteria
COUNT DISTINCT: Returns the number of unique values in a specified column
SUM: Calculates the total sum of values in a column
SUM DISTINCT: Calculates the sum of distinct values in a specified column
STDDEV: Returns the biased standard deviation (division by n) of a set of numbers
For example, a dataset owner could create a dataset analysis rule that only allows a query to run if Count Distinct is applied to the specified field. In that case, your query must use Count Distinct on the specified field at least once. Otherwise, the query cannot be run. This is an added privacy protection policy many data owners put in place to limit the risk of leaking privacy for individuals whose data is included in the dataset.
If the dataset owner limits the functions you can use, you will receive an error if these functions are not applied to the specified field when generating aggregates. The failure notice will indicate the needed query modifications. To avoid such errors, communicate with the dataset owner to know what is allowed.
For information about dataset analysis rules, see "Definition of Analytical Rules".
Report Configuration
Updating a Query Erases the Previous Report Runs
When you update and save a query, the previous runs will disappear. As a best practice, if you would like to keep earlier runs of the question, you can clone the query. If question versioning is enabled for your organization, prior runs are retained. For information, see "View Question History and Switch Versions".
Questions can be updated only when the status is "stage". Once published, questions cannot be edited. For every update, the stage versions will change from Stage v1 to Stage v2, Stage 3, and so on. Clone is the only available option once a question's status is "published".
Changing Components in Question Builder or Report Configuration
If you change the chart type or anything in the existing charts on the Report Configuration page, the dataset does not need to be reconnected and approved.
Similarly, if you change any component in the Question Configuration window other than the SQL itself or the Parameters, the datasets do not need to be reconfigured.
Report Filters
When you add a filter to a dimension on the Report Configuration page, after selecting either Single Select or Multi Select, you cannot unselect the option. Additionally, a Filter Type must be selected and cannot be left empty. To remove the filter, deselect the Filter toggle.
If you have a function such as distinct count or average and your filter is at a lower level of granularity than your measure, your calculations may not appear or work correctly.
![]() |
Selecting the Y-Axis Position When Configuring Reports
Once the y-axis position is selected, you cannot unselect the value in that list. This will be updated in a future release so that the y-axis position can be unselected.