Clean Room FAQs

See the sections below for answers to common LiveRamp Clean Room questions.

Data Connection FAQs

See the FAQs below for common data connection questions.

Why does partitioning matter?

Partitioning optimizes the dataset even before you get to the query stage. Partitioning improves query performance because data processing during question runs occurs only on the relevant filtered data. For more information, see “Data Connection Partitioning”.

What are the best practices for partitioning?

Data partitioning (dividing a large dataset into smaller, more manageable subsets) is recommended for optimizing query performance and leading to faster processing times. By indicating partition columns for your data connections, data processing during question runs occurs only on the relevant filtered data, which reduces query cost and time to execute. Best practices include:

Partition at the source: When configuring your data connection to LiveRamp Clean Room, define partition columns.
Consider the collaboration context: Make sure that the partition columns make sense for the types of questions that a dataset is likely to be used for. For example:
- If you anticipate questions that analyze data over time, partition the dataset by a date field (e.g., event_date or impression_date). This allows queries that filter by date ranges to scan only relevant partitions, reducing processing time and costs.
- If the main use case is to analyze data by different brands or products, then partitioning by a brand or product_id column makes sense. This strategy ensures that queries filtering by brand will only access the necessary subset of the data.
Verify column data types: Partitioning supports date, string, integer, and timestamp field types. Complex types (such as arrays, maps, or structs) are not allowed.
Cloud-specific formatting: For cloud storage sources like S3, GCS, and Azure, structure your buckets and file paths in a partitioning format based on the partition column. For BigQuery and Snowflake, make sure columns are indicated as partition keys in your source tables.

For more information, see "Data Connection Partitioning".

How should I map fields if my data contains a RampID column?

If you data contains a column with RampIDs, do not slide the PII toggle for that column. Mark the RampID column as a User Identifier and select "RampID" as the identifier type. If the data contains a RampID column, no other columns can be enabled as PII.

Clean Room Setup FAQs

See the FAQs below for common setup questions.

Where do queries run in a hybrid or confidential computing clean room?

Queries run within LiveRamp's secure Clean Room environment. For hybrid and hybrid confidential computing (HCC) clean rooms, the underlying engine that executes your SQL queries is Apache Spark SQL in a distributed, multi-tenant environment.

How do I choose where to execute to improve performance?

"Where" primarily refers to the type of clean room that your organization selects based on your collaboration goals and the location of your data and your partners' data. Different clean room types use different execution engines, which impact how queries are processed. Optimizing for performance partly depends on your clean room type (such as Snowflake, Google BigQuery, or Hybrid).

Snowflake: Optimize your queries for Snowflake's native SQL engine. For more information, see "Query Data in Snowflake" in Snowflake's documentation.
BigQuery: Queries use the GoogleSQL dialect, and the compute typically runs within the clean room owner's BigQuery project. For more information, see "Optimize Queries" in the BigQuery documentation.

Hybrid and Hybrid Confidential Computing (HCC): Queries are executed using Apache Spark SQL within LiveRamp's secure data plane environment. For more information, see "Performance Tuning" in the Apache Spark documentation.

Google Cloud BigQuery Clean Room FAQs

See the FAQs below for common Google Cloud BigQuery clean room questions.

BigQuery Clean Room Setup FAQs

What do we need to do to get the most out of BigQuery clean rooms?

Prior to orchestrating BigQuery clean rooms in LiveRamp Clean Room, it is important to configure the necessary permissions in the Google Cloud Platform (GCP) and LiveRamp Clean Room, as well as enable certain APIs for your project. For information, see “Configure a BigQuery Data Connection for a BigQuery Clean Room”.

Are multiple service accounts allowed in a single organization?

Yes, multiple Google service accounts can be used in an organization to bring data.

BigQuery Clean Room Permissions FAQs

Why does a service account need bigquery.datasets.get and bigquery.datasets.create permissions?

LiveRamp uses these permissions to create a dataset in the owner/partner project. This is done to create an authorized view in a dataset that is shared as a private exchange.

What is the use of the “BigQuery Metadata Viewer” role on the table to the owner/partner service account?

The role helps render the data connections screen UI. It helps fetch the table metadata and render it on the LiveRamp Clean Room UI.

Why do you need to create datasets in our GCP project?

We create an authorized view from the owner/partner table and it lives in a separate shared dataset other than the owner dataset. LiveRamp orchestrates the creation of this shared dataset. Note that this shared dataset is different from the source dataset and is only created to be part of the private exchange in Analytics Hub.

What if we do not want to use BigQuery Data Owner or other BigQuery roles?

You can create custom roles with the minimum set of permissions listed in “Configure a BigQuery Data Connection for a BigQuery Clean Room” and assign them to the project.

I have four tables in a dataset but am using only one table in the clean room. Is LiveRamp able to access the rest of the tables?

BigQuery Metadata Viewer permission is expected at the table level. So LiveRamp doesn’t have access to the rest of the tables which do not have the above role and are not a part of the data connections screen.

Do authorized views need to be created in the clean room owner's project? Can LiveRamp host the authorized views?

LiveRamp is creating the authorized view in the clean room Owner’s BigQuery project. In order to facilitate the process of creating the authorized view, LiveRamp first creates a dataset and then creates an authorized view in it which is accessible to LiveRamp.The permissions needed to do this are bigquery.datasets.create, bigquery.datasets.get, and bigquery.datasets.update. LiveRamp can only create, update, or get a dataset which is the one LiveRamp creates while creating the authorized view. The LiveRamp service account does not have access to list any other datasets in the Owner BigQuery project.

For some of our columns, such as email address, we have implemented column-level security. Does the Owner service account and/or the LiveRamp service account need permission to read columns with column level security if we are using the Analytics Hub/private exchange?

If some columns are masked, the Owner service account would need to provide the BigQuery Data Viewer role at the table level. If there is no masking, the Owner service account needs the BigQuery Metadata Viewer role (listed in “Configure a BigQuery Data Connection for a BigQuery Clean Room”).

At which step in the process is the Analytics Hub private exchange created?

When the owner/partner configures the data connection in a clean room, LiveRamp creates a dataset, an authorized view, and a private exchange and a listing under the exchange and then adds the LiveRamp service account as an Analytics Hub subscriber to the listing.

Is the clean room created in the Client Project?

Authorized views, Private Exchange, and listing are created in the Client Project. At the time of clean room question execution, the LiveRamp service account subscribes to the private listing, creates a BigQuery job, and, after the job is complete, unsubscribes from the listing. This is what is known as “ephemeral access”.

BigQuery Clean Room Data FAQs

Do we need to add any partitions to the BigQuery assets?

Partitioning is highly recommended for best performance.

BigQuery Clean Room Compute FAQs

How does compute work for BigQuery clean rooms?

We support compute in the clean room Owner’s project today through a clean room parameter in the BigQuery clean room called “Billing Project ID”. It's set in the clean room configuration screen and is where the compute will happen and the BigQuery jobs will be created.

BigQuery Clean Room Billing FAQs

Who gets billed for the compute?

The Billing Project ID of the clean room determines which project gets billed for the job execution.

Does the billing project specified need to be the same as the project via which authorized views will be shared in Analytics Hub?

It doesn’t matter. It could be any BigQuery project with the appropriate billing setup.

Permissions FAQs

See the FAQs below for common permissions questions.

How are permissions managed in LiveRamp Clean Room?

Clean room owners control how internal users and partners engage with clean rooms, questions, and outputs and can manage permissions at the clean room and the question level.

Clean room-level permissions define what users can do within the clean room itself.
Question-level permissions control interactions with specific questions within a clean room.

An organization's administrators manage partner level and role level permissions.

Partner level: Clean room owners specify which clean room permissions are available to their partners to assign.
Role level: Clean room partners can create custom user roles based on the available clean room permissions.

For more information, see "Managing Clean Room Permissions" and "Question Management".

Question and Query FAQs

See the FAQs below for common questions about Question Builder and Query Builder.

What can I do to improve the performance of my question runs?

Perform data validation in your own environment before connecting your data source to LiveRamp Clean Room. The key areas to consider include making sure that you’re seeing the expected row counts and fill rates for key fields.
The fields listed in LiveRamp's sample schemas can be used to prepare your data for your clean room collaboration use cases. Pay close attention to column names, data types, any required hashing for PII, and so on. Having the proper file formatting in place will make the remainder of your setup much more seamless. Validating it with QA queries in your own environment can reduce costs and save time once you've connected your data to LiveRamp.
Enable partitioning for date columns and key string columns for the datasets assigned to the question, and use partitioning so that only the relevant data is used during execution, not the entire dataset. Data partitioning significantly enhances query performance, reduces data egress, and thus optimizes cost. For information, see "Partition a Dataset in LiveRamp Clean Rooms".
Optimize your queries based on the cloud environment of your clean room type.
Different clean room types use different execution engines, which can impact the performance of question runs. Your Clean Room partners' chosen cloud platform and region also impact the performance of your shared question runs. If you and your partner use the same clean room type, your query will make use of the same SQL engine. While LiveRamp Clean Room is designed to be cloud-agnostic and enable connections to data stored in major cloud providers, the underlying cloud platforms and regions of you and your partners can affect the collaboration and data processing.

What SQL engine should I optimize for?

The SQL engine you should optimize for depends on the specific type of clean room you are using, because different clean room types may support different SQL dialects:

Hybrid and Hybrid Confidential Computing (HCC) clean rooms use Apache Spark SQL because of its support for distributed processing of large datasets. For information, see "Clean Compute on Apache Spark".
Google BigQuery clean rooms use GoogleSQL.
Snowflake clean rooms support standard SQL.

What is LiveRamp's default warehouse size for a question run?

Question Builder's "Advanced Question Settings" include various options for the "Processing capacity needed to run the question". If a question is taking 8 hours to execute or longer, we recommend bumping up the warehouse size.

The available warehouse sizes may be used to select the level of compute required to process your question successfully.

Warehouse Size	When to use
Default	Default processing capacity is appropriate for the vast majority of queries and datasets.
Medium	Medium capacity is only required for Questions that are not satisfied by the default setting or for complex queries.
Large	Large capacity is only required for Questions that are not satisfied by the default setting / exceptionally complex queries and large record counts (1+ billion).
Extra Large	Extra large capacity is only required for Questions that are not satisfied by the Large setting / exceptionally complex queries and very large record counts (10+ billion).
2X Large	Intended for extremely complex questions and very large datasets. Use only when Extra Large capacity is insufficient.
3X Large	Designed for exceptionally high-complexity questions and massive-scale datasets. Requires pre-approval from LiveRamp due to significant resource consumption.

Why are queries that work in my cloud data warehouse failing in LiveRamp Clean Rooms?

LiveRamp's Hybrid and Hybrid Confidential Compute (HCC) clean rooms use Apache Spark as the underlying engine for executing queries. This leverages distributed computing, which does not auto-scale (and increase cost) to meet the demands of a query. Cloud data warehouses use centralized computing, which is set up to provide greater resources (and increase cost) as the query requires more compute consumption.

Spark is designed for the distributed processing of large datasets, where each query execution is allocated a fixed amount of compute resources. LiveRamp is continually optimizing how these resources are allocated, but due to the fixed nature of each execution job, you may need to test and learn to determine which warehouse size is appropriate for your question. Optimizing queries for Spark involves considering how data is partitioned and processed across nodes. Inefficient queries in this distributed environment can lead to performance bottlenecks, high memory consumption, and potential failures.

Ways to mitigate this risk are to:

Leverage data connection partitioning to ensure only relevant data is processed during query execution
Optimize your code for Spark SQL execution. Not sure how to optimize for this query engine? See LiveRamp's AI SQL Optimization helper.
Leverage the CACHE TABLE function for storing CTEs for repeated use.

What could be leading to memory issues with clean rooms, such as functions failing after 3 hours?

Failures after a long duration, such as 3 hours, are often linked to memory issues that arise during question runs due to how the query is structured. It is not likely caused by functions alone but also by other aspects of the query, such as cross-joins on large datasets or multiple common table expressions (CTEs) reused throughout the query. Failures can also occur for a variety of other reasons, such as within custom Python code, insufficient processing capacity for the warehouse size, transient interruptions, or timeouts.

If troubleshooting steps, such as query optimization and adjusting warehouse size, do not resolve the issue, contact your LiveRamp representative to explore additional options.

Why are my runs queuing for so long?

Long queuing times for question runs can be related to the performance and resource availability within the Clean Room execution environment. If many complex or large queries are submitted simultaneously, there might be insufficient compute resources immediately available, leading to a queue. Problems with the data itself (such as missing data or problematic formatting) could delay the start of a question run if the system encounters problems accessing the necessary data from its source connections. Queueing is not a sign of issues within the system.

What is a Data Plane?

The "data plane" refers to the physical or logical environment where queries are executed and data is processed. It operates within your chosen cloud environment and within a private virtual network in a specific region to ensure data remains localized and secure. This also minimizes data movement and egress costs.

Multiple data planes can participate in question runs, including multiple partners' datasets. LiveRamp Clean Room is built on a multi-plane architecture, with a single control plane coordinating operations across multiple data planes. This design enables scalability and multi-cloud compatibility for secure data collaboration across a global footprint, including the U.S., EU, APAC, and LATAM regions. Each data plane is fully isolated, ensuring secure collaboration and processing data locally to comply with data sovereignty regulations and minimize latency.

How can I resolve a failed question run due to a dataset issue?

If you receive an error message that includes something similar to "Error Message: Failed to read data for dataset=<dataset_name>_pqt", this is likely due to a change at your source data location, such as a deleted source file or a change in permissions.

After a data connection is first created, the Question Run process does not check whether a file still exists or if the data are correct. So, if somebody changes or deletes that file at the source, the question run may fail at a later time.

Recommendations include:

Use consistent field types for the same field within all files and data.
Recast any fields that are inconsistent as "STRING".
Before adding files with recent updates, verify that they use the same format as the reference schema that was previously sent.

LiveRamp Clean Room AI FAQs

How LiveRamp Clean Room Uses AI

LiveRamp Clean Room provides several opportunities to interact with LiveRamp Clean Room AI, Clean Room's generative AI helper bot. The goal of LiveRamp Clean Room AI is to help our users reduce time-to-value in performing tasks within the Clean Room Console by predicting which analyses would be beneficial to your organization, how questions, user lists, and alerts can be built, and how to describe questions to your partners for approval.

LiveRamp recognizes that AI is a broad discipline, and it is often hard to know what data is used to generate AI-powered responses. In the spirit of transparency, we'd like to explain how LiveRamp Clean Room AI works.

What model does LiveRamp Clean Room use to generate helper bot responses?

Clean Room leverages models available via the OpenAI Platform as the foundation models for providing responses to Clean Room Console users. For more information, see "AI Disclosure".

What data does the model use to inform responses?

LiveRamp Clean Room integration uses OpenAI's API to provide responses based on data trained to inform OpenAI's models. To fine-tune the model, Clean Room only leverages the schemas (names, fields, and data types) of the data connections already provided within your organization to provide more relevant responses and suggestions. It's important to note:

OpenAI has no access to messages you write in LiveRamp Clean Room AI or data you have connected to Clean Room Console.
Clean Room does not use your organization's underlying data to inform the models.
Your messages to LiveRamp Clean Room AI are not used to fine-tune the messages you receive.

Are my inputs recorded?

Yes, Clean Room securely stores chat histories as a reference for our Customer Success team in cases where they may need to provide additional support and make improvement requests.

How accurate are responses from LiveRamp Clean Room AI?

While the accuracy of outputs from AI tools is improving at a rapid rate, generative AI is still a nascent field. This means we cannot guarantee that all responses will be accurate and always provide users with the opportunity to correct an AI-generated response before using it.

Do I have to use LiveRamp Clean Room AI?

While we hope LiveRamp Clean Room AI is a useful tool for most users, it is completely optional. If you’d prefer not to use it, we've provided the option to hide LiveRamp Clean Room AI's helper bot.

How to Interact with LiveRamp Clean Room AI

You can interact with LiveRamp Clean Room AI in the same manner you would interact with a chatbot on any other website. We recommend asking the helper bot questions and prompting it with messages describing what you hope to achieve. For example:

I want a SQL query that will allow me to understand the overlap between my audience and my partner's audience for CRM and exposure log data.
Give me a query that will tell me the ROAS for my March campaign across all destinations where I've activated data from Clean Room.
Set up an alert that will email me anytime the average basket size across my SKUs at retail locations in Boston dips below $10.00.

If you have any concerns with LiveRamp Clean Room AI's responses or are having trouble using the tools effectively, contact your LiveRamp representative.

AI Disclosure

OpenAI ChatGPT API Integration and Privacy

LiveRamp uses OpenAI, L.L.C.'s ("OpenAI") ChatGPT API to help you create alerts, queries, and natural language code descriptions. OpenAI's ChatGPT API uses large language models, and the content it creates may not be fully accurate or reliable. We want you to know:

LiveRamp’s agreement with OpenAI prevents OpenAI from using your data to train its models. OpenAI describes its data usage policies here.
In addition to the industry-standard technical, administrative, and physical controls that LiveRamp uses to protect your data, OpenAI's requests and responses are encrypted using transport layer security (TLS), and OpenAI API is SOC 2 Type 2 compliant.

In this section: